Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 70227 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 4.3 MiB |
| Average record size in memory | 64.0 B |
Variable types
| Numeric | 1 |
|---|---|
| Categorical | 7 |
Alerts
country has a high cardinality: 127 distinct values | High cardinality |
coverage has a high cardinality: 10044 distinct values | High cardinality |
issn has a high cardinality: 37076 distinct values | High cardinality |
publisher has a high cardinality: 11095 distinct values | High cardinality |
title has a high cardinality: 68293 distinct values | High cardinality |
country is highly imbalanced (55.1%) | Imbalance |
coverage is highly imbalanced (53.4%) | Imbalance |
publisher is highly imbalanced (52.0%) | Imbalance |
title is uniformly distributed | Uniform |
sourceid has unique values | Unique |
Reproduction
| Analysis started | 2023-05-04 15:12:09.807657 |
|---|---|
| Analysis finished | 2023-05-04 15:12:13.789986 |
| Duration | 3.98 seconds |
| Software version | ydata-profiling vv4.1.2 |
| Download configuration | config.json |
sourceid
Real number (ℝ)
| Distinct | 70227 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3699065 × 1010 |
| Minimum | 12000 |
|---|---|
| Maximum | 2.110106 × 1010 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 548.8 KiB |
Quantile statistics
| Minimum | 12000 |
|---|---|
| 5-th percentile | 16616.3 |
| Q1 | 98868 |
| median | 1.9900194 × 1010 |
| Q3 | 2.1100457 × 1010 |
| 95-th percentile | 2.1100932 × 1010 |
| Maximum | 2.110106 × 1010 |
| Range | 2.1101048 × 1010 |
| Interquartile range (IQR) | 2.1100358 × 1010 |
Descriptive statistics
| Standard deviation | 9.3402116 × 109 |
|---|---|
| Coefficient of variation (CV) | 0.68181378 |
| Kurtosis | -1.472561 |
| Mean | 1.3699065 × 1010 |
| Median Absolute Deviation (MAD) | 1.2006955 × 109 |
| Skewness | -0.64060688 |
| Sum | 9.6204427 × 1014 |
| Variance | 8.7239552 × 1019 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 12000 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032773 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032772 × 1010 | 1 | < 0.1% |
| 2.110032823 × 1010 | 1 | < 0.1% |
| Other values (70217) | 70217 |
| Value | Count | Frequency (%) |
| 12000 | 1 | |
| 12001 | 1 | |
| 12002 | 1 | |
| 12004 | 1 | |
| 12005 | 1 | |
| 12006 | 1 | |
| 12007 | 1 | |
| 12008 | 1 | |
| 12009 | 1 | |
| 12010 | 1 |
| Value | Count | Frequency (%) |
| 2.110105979 × 1010 | 1 | |
| 2.110105978 × 1010 | 1 | |
| 2.110105978 × 1010 | 1 | |
| 2.110105949 × 1010 | 1 | |
| 2.11010593 × 1010 | 1 | |
| 2.11010593 × 1010 | 1 | |
| 2.110105901 × 1010 | 1 | |
| 2.110105901 × 1010 | 1 | |
| 2.110105897 × 1010 | 1 | |
| 2.110105896 × 1010 | 1 |
country
Categorical
HIGH CARDINALITY IMBALANCE
| Distinct | 127 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| United States | |
|---|---|
| United Kingdom | |
| Netherlands | 3481 |
| Germany | 2712 |
| Switzerland | 1167 |
| Other values (122) |
Length
| Max length | 22 |
|---|---|
| Median length | 13 |
| Mean length | 11.377476 |
| Min length | 4 |
Characters and Unicode
| Total characters | 799006 |
|---|---|
| Distinct characters | 50 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 18 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | United States |
|---|---|
| 2nd row | United States |
| 3rd row | United States |
| 4th row | United States |
| 5th row | United States |
Common Values
| Value | Count | Frequency (%) |
| United States | 37138 | |
| United Kingdom | 8812 | 12.5% |
| Netherlands | 3481 | 5.0% |
| Germany | 2712 | 3.9% |
| Switzerland | 1167 | 1.7% |
| China | 1150 | 1.6% |
| France | 1136 | 1.6% |
| Italy | 1058 | 1.5% |
| Spain | 967 | 1.4% |
| Japan | 862 | 1.2% |
| Other values (117) | 11744 | 16.7% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| united | 46079 | |
| states | 37138 | |
| kingdom | 8812 | 7.4% |
| netherlands | 3481 | 2.9% |
| germany | 2712 | 2.3% |
| switzerland | 1167 | 1.0% |
| china | 1150 | 1.0% |
| france | 1136 | 1.0% |
| italy | 1058 | 0.9% |
| spain | 967 | 0.8% |
| Other values (136) | 14679 | 12.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 129585 | |
| e | 101648 | |
| n | 73646 | |
| i | 66270 | |
| a | 65217 | |
| d | 63365 | |
| 48152 | 6.0% | |
| U | 46202 | 5.8% |
| s | 43443 | 5.4% |
| S | 40698 | 5.1% |
| Other values (40) | 120780 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 632496 | |
| Uppercase Letter | 118358 | 14.8% |
| Space Separator | 48152 | 6.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 129585 | |
| e | 101648 | |
| n | 73646 | |
| i | 66270 | |
| a | 65217 | |
| d | 63365 | |
| s | 43443 | 6.9% |
| r | 14470 | 2.3% |
| o | 13760 | 2.2% |
| m | 12645 | 2.0% |
| Other values (16) | 48447 | 7.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 46202 | |
| S | 40698 | |
| K | 9274 | 7.8% |
| N | 3787 | 3.2% |
| C | 2931 | 2.5% |
| G | 2880 | 2.4% |
| I | 2386 | 2.0% |
| F | 1932 | 1.6% |
| P | 1474 | 1.2% |
| R | 1350 | 1.1% |
| Other values (13) | 5444 | 4.6% |
Space Separator
| Value | Count | Frequency (%) |
| 48152 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 750854 | |
| Common | 48152 | 6.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 129585 | |
| e | 101648 | |
| n | 73646 | |
| i | 66270 | |
| a | 65217 | |
| d | 63365 | |
| U | 46202 | 6.2% |
| s | 43443 | 5.8% |
| S | 40698 | 5.4% |
| r | 14470 | 1.9% |
| Other values (39) | 106310 |
Common
| Value | Count | Frequency (%) |
| 48152 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 799006 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 129585 | |
| e | 101648 | |
| n | 73646 | |
| i | 66270 | |
| a | 65217 | |
| d | 63365 | |
| 48152 | 6.0% | |
| U | 46202 | 5.8% |
| s | 43443 | 5.4% |
| S | 40698 | 5.1% |
| Other values (40) | 120780 |
coverage
Categorical
HIGH CARDINALITY IMBALANCE
| Distinct | 10044 |
|---|---|
| Distinct (%) | 14.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| None | |
|---|---|
| 2008-2021 | 789 |
| 2009-2021 | 767 |
| 2010-2021 | 750 |
| 2018-2021 | 739 |
| Other values (10039) |
Length
| Max length | 303 |
|---|---|
| Median length | 213 |
| Mean length | 9.1563786 |
| Min length | 4 |
Characters and Unicode
| Total characters | 643025 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8447 ? |
|---|---|
| Unique (%) | 12.0% |
Sample
| 1st row | 1999-2003, 2005, 2008 |
|---|---|
| 2nd row | 1958-2021 |
| 3rd row | 1969-2021 |
| 4th row | 2000-2021 |
| 5th row | 1988-2021 |
Common Values
| Value | Count | Frequency (%) |
| None | 33355 | |
| 2008-2021 | 789 | 1.1% |
| 2009-2021 | 767 | 1.1% |
| 2010-2021 | 750 | 1.1% |
| 2018-2021 | 739 | 1.1% |
| 2019-2021 | 703 | 1.0% |
| 1996-2021 | 694 | 1.0% |
| 2011-2021 | 689 | 1.0% |
| 2020-2021 | 657 | 0.9% |
| 2017-2021 | 640 | 0.9% |
| Other values (10034) | 30444 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| none | 33355 | |
| 1996-2021 | 1049 | 1.1% |
| 2008-2021 | 927 | 1.0% |
| 2009-2021 | 912 | 1.0% |
| 2010-2021 | 863 | 0.9% |
| 2020-2021 | 850 | 0.9% |
| 2018-2021 | 847 | 0.9% |
| 2019-2021 | 813 | 0.9% |
| 2011-2021 | 811 | 0.9% |
| 2017-2021 | 778 | 0.8% |
| Other values (2621) | 50924 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 102463 | |
| 0 | 96451 | |
| 1 | 89218 | |
| 9 | 62377 | |
| - | 46141 | |
| o | 33355 | 5.2% |
| N | 33355 | 5.2% |
| e | 33355 | 5.2% |
| n | 33355 | 5.2% |
| , | 21902 | 3.4% |
| Other values (7) | 91053 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 419660 | |
| Lowercase Letter | 100065 | 15.6% |
| Dash Punctuation | 46141 | 7.2% |
| Uppercase Letter | 33355 | 5.2% |
| Other Punctuation | 21902 | 3.4% |
| Space Separator | 21902 | 3.4% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 102463 | |
| 0 | 96451 | |
| 1 | 89218 | |
| 9 | 62377 | |
| 8 | 18387 | 4.4% |
| 7 | 14777 | 3.5% |
| 6 | 11980 | 2.9% |
| 5 | 8512 | 2.0% |
| 4 | 8154 | 1.9% |
| 3 | 7341 | 1.7% |
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 33355 | |
| e | 33355 | |
| n | 33355 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 46141 |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 33355 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 21902 |
Space Separator
| Value | Count | Frequency (%) |
| 21902 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 509605 | |
| Latin | 133420 | 20.7% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 102463 | |
| 0 | 96451 | |
| 1 | 89218 | |
| 9 | 62377 | |
| - | 46141 | |
| , | 21902 | 4.3% |
| 21902 | 4.3% | |
| 8 | 18387 | 3.6% |
| 7 | 14777 | 2.9% |
| 6 | 11980 | 2.4% |
| Other values (3) | 24007 | 4.7% |
Latin
| Value | Count | Frequency (%) |
| o | 33355 | |
| N | 33355 | |
| e | 33355 | |
| n | 33355 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 643025 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 102463 | |
| 0 | 96451 | |
| 1 | 89218 | |
| 9 | 62377 | |
| - | 46141 | |
| o | 33355 | 5.2% |
| N | 33355 | 5.2% |
| e | 33355 | 5.2% |
| n | 33355 | 5.2% |
| , | 21902 | 3.4% |
| Other values (7) | 91053 |
issn
Categorical
| Distinct | 37076 |
|---|---|
| Distinct (%) | 52.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| - | |
|---|---|
| 10716947 | 9 |
| 15417719 | 7 |
| 09353224 | 3 |
| 1038412X | 3 |
| Other values (37071) |
Length
| Max length | 38 |
|---|---|
| Median length | 28 |
| Mean length | 7.2422573 |
| Min length | 1 |
Characters and Unicode
| Total characters | 508602 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 37009 ? |
|---|---|
| Unique (%) | 52.7% |
Sample
| 1st row | 15276228 |
|---|---|
| 2nd row | 00225002, 19383711 |
| 3rd row | 00225061, 15206696 |
| 4th row | 15299740, 15299732 |
| 5th row | 15736598, 08949867 |
Common Values
| Value | Count | Frequency (%) |
| - | 33072 | |
| 10716947 | 9 | < 0.1% |
| 15417719 | 7 | < 0.1% |
| 09353224 | 3 | < 0.1% |
| 1038412X | 3 | < 0.1% |
| 10503862 | 2 | < 0.1% |
| 15938883, 11296569 | 2 | < 0.1% |
| 10672478 | 2 | < 0.1% |
| 07347464 | 2 | < 0.1% |
| 10928138 | 2 | < 0.1% |
| Other values (37066) | 37123 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 33072 | ||
| 10716947 | 9 | < 0.1% |
| 15417719 | 7 | < 0.1% |
| 16608151 | 3 | < 0.1% |
| 13474065 | 3 | < 0.1% |
| 00214922 | 3 | < 0.1% |
| 1038412x | 3 | < 0.1% |
| 09353224 | 3 | < 0.1% |
| 0148396x | 2 | < 0.1% |
| 14320681 | 2 | < 0.1% |
| Other values (54838) | 54949 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 63861 | |
| 0 | 59077 | |
| 2 | 48805 | |
| 5 | 40128 | |
| 3 | 39654 | |
| 7 | 38534 | |
| 4 | 38129 | |
| 9 | 36273 | |
| 6 | 35512 | |
| 8 | 34913 | |
| Other values (4) | 73716 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 434886 | |
| Dash Punctuation | 33072 | 6.5% |
| Other Punctuation | 17829 | 3.5% |
| Space Separator | 17829 | 3.5% |
| Uppercase Letter | 4986 | 1.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 63861 | |
| 0 | 59077 | |
| 2 | 48805 | |
| 5 | 40128 | |
| 3 | 39654 | |
| 7 | 38534 | |
| 4 | 38129 | |
| 9 | 36273 | |
| 6 | 35512 | |
| 8 | 34913 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 33072 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 17829 |
Space Separator
| Value | Count | Frequency (%) |
| 17829 |
Uppercase Letter
| Value | Count | Frequency (%) |
| X | 4986 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 503616 | |
| Latin | 4986 | 1.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 63861 | |
| 0 | 59077 | |
| 2 | 48805 | |
| 5 | 40128 | |
| 3 | 39654 | |
| 7 | 38534 | |
| 4 | 38129 | |
| 9 | 36273 | |
| 6 | 35512 | |
| 8 | 34913 | |
| Other values (3) | 68730 |
Latin
| Value | Count | Frequency (%) |
| X | 4986 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 508602 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 63861 | |
| 0 | 59077 | |
| 2 | 48805 | |
| 5 | 40128 | |
| 3 | 39654 | |
| 7 | 38534 | |
| 4 | 38129 | |
| 9 | 36273 | |
| 6 | 35512 | |
| 8 | 34913 | |
| Other values (4) | 73716 |
publisher
Categorical
HIGH CARDINALITY IMBALANCE
| Distinct | 11095 |
|---|---|
| Distinct (%) | 15.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| None | |
|---|---|
| Taylor and Francis Ltd. | 1483 |
| Elsevier BV | 771 |
| Routledge | 731 |
| Elsevier | 638 |
| Other values (11090) |
Length
| Max length | 158 |
|---|---|
| Median length | 144 |
| Mean length | 15.862574 |
| Min length | 3 |
Characters and Unicode
| Total characters | 1113981 |
|---|---|
| Distinct characters | 77 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8493 ? |
|---|---|
| Unique (%) | 12.1% |
Sample
| 1st row | Columbus State University |
|---|---|
| 2nd row | Wiley-Blackwell |
| 3rd row | John Wiley & Sons Inc. |
| 4th row | Routledge |
| 5th row | Wiley-Blackwell |
Common Values
| Value | Count | Frequency (%) |
| None | 33944 | |
| Taylor and Francis Ltd. | 1483 | 2.1% |
| Elsevier BV | 771 | 1.1% |
| Routledge | 731 | 1.0% |
| Elsevier | 638 | 0.9% |
| Wiley-Blackwell Publishing Ltd | 638 | 0.9% |
| SAGE Publications Inc. | 516 | 0.7% |
| Springer Verlag | 511 | 0.7% |
| Elsevier Ltd. | 473 | 0.7% |
| Emerald Group Publishing Ltd. | 447 | 0.6% |
| Other values (11085) | 30075 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| none | 33944 | 20.6% |
| ltd | 6130 | 3.7% |
| of | 5689 | 3.5% |
| and | 4038 | 2.4% |
| university | 3672 | 2.2% |
| publishing | 3496 | 2.1% |
| inc | 2902 | 1.8% |
| press | 2755 | 1.7% |
| elsevier | 2522 | 1.5% |
| de | 2212 | 1.3% |
| Other values (10581) | 97505 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 115032 | 10.3% |
| n | 97400 | 8.7% |
| 94648 | 8.5% | |
| o | 82933 | 7.4% |
| i | 82401 | 7.4% |
| a | 59831 | 5.4% |
| r | 53928 | 4.8% |
| s | 50889 | 4.6% |
| t | 48634 | 4.4% |
| l | 43017 | 3.9% |
| Other values (67) | 385268 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 836450 | |
| Uppercase Letter | 164894 | 14.8% |
| Space Separator | 94648 | 8.5% |
| Other Punctuation | 13912 | 1.2% |
| Dash Punctuation | 2155 | 0.2% |
| Open Punctuation | 853 | 0.1% |
| Close Punctuation | 849 | 0.1% |
| Math Symbol | 132 | < 0.1% |
| Decimal Number | 87 | < 0.1% |
| Connector Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 115032 | |
| n | 97400 | |
| o | 82933 | |
| i | 82401 | |
| a | 59831 | 7.2% |
| r | 53928 | 6.4% |
| s | 50889 | 6.1% |
| t | 48634 | 5.8% |
| l | 43017 | 5.1% |
| c | 36358 | 4.3% |
| Other values (16) | 166027 |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 37339 | |
| S | 13843 | 8.4% |
| P | 13758 | 8.3% |
| A | 9717 | 5.9% |
| E | 9594 | 5.8% |
| I | 9494 | 5.8% |
| L | 8824 | 5.4% |
| C | 7841 | 4.8% |
| M | 6371 | 3.9% |
| B | 6000 | 3.6% |
| Other values (16) | 42113 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 10725 | |
| , | 1419 | 10.2% |
| ; | 537 | 3.9% |
| & | 535 | 3.8% |
| ' | 460 | 3.3% |
| / | 168 | 1.2% |
| " | 45 | 0.3% |
| : | 22 | 0.2% |
| * | 1 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 29 | |
| 1 | 26 | |
| 5 | 12 | |
| 0 | 7 | 8.0% |
| 3 | 5 | 5.7% |
| 4 | 5 | 5.7% |
| 2 | 2 | 2.3% |
| 9 | 1 | 1.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 852 | |
| [ | 1 | 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 848 | |
| ] | 1 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 94648 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2155 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 132 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1001344 | |
| Common | 112637 | 10.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 115032 | 11.5% |
| n | 97400 | 9.7% |
| o | 82933 | 8.3% |
| i | 82401 | 8.2% |
| a | 59831 | 6.0% |
| r | 53928 | 5.4% |
| s | 50889 | 5.1% |
| t | 48634 | 4.9% |
| l | 43017 | 4.3% |
| N | 37339 | 3.7% |
| Other values (42) | 329940 |
Common
| Value | Count | Frequency (%) |
| 94648 | ||
| . | 10725 | 9.5% |
| - | 2155 | 1.9% |
| , | 1419 | 1.3% |
| ( | 852 | 0.8% |
| ) | 848 | 0.8% |
| ; | 537 | 0.5% |
| & | 535 | 0.5% |
| ' | 460 | 0.4% |
| / | 168 | 0.1% |
| Other values (15) | 290 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1113981 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 115032 | 10.3% |
| n | 97400 | 8.7% |
| 94648 | 8.5% | |
| o | 82933 | 7.4% |
| i | 82401 | 7.4% |
| a | 59831 | 5.4% |
| r | 53928 | 4.8% |
| s | 50889 | 4.6% |
| t | 48634 | 4.4% |
| l | 43017 | 3.9% |
| Other values (67) | 385268 |
region
Categorical
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| Northern America | |
|---|---|
| Western Europe | |
| Asiatic Region | |
| Eastern Europe | 3397 |
| Latin America | 1176 |
| Other values (4) | 2064 |
Length
| Max length | 18 |
|---|---|
| Median length | 16 |
| Mean length | 15.006166 |
| Min length | 6 |
Characters and Unicode
| Total characters | 1053838 |
|---|---|
| Distinct characters | 27 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Northern America |
|---|---|
| 2nd row | Northern America |
| 3rd row | Northern America |
| 4th row | Northern America |
| 5th row | Northern America |
Common Values
| Value | Count | Frequency (%) |
| Northern America | 37936 | |
| Western Europe | 21255 | |
| Asiatic Region | 4399 | 6.3% |
| Eastern Europe | 3397 | 4.8% |
| Latin America | 1176 | 1.7% |
| Middle East | 892 | 1.3% |
| Pacific Region | 792 | 1.1% |
| Africa | 240 | 0.3% |
| Africa/Middle East | 140 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| america | 39112 | |
| northern | 37936 | |
| europe | 24652 | |
| western | 21255 | |
| region | 5191 | 3.7% |
| asiatic | 4399 | 3.1% |
| eastern | 3397 | 2.4% |
| latin | 1176 | 0.8% |
| east | 1032 | 0.7% |
| middle | 892 | 0.6% |
| Other values (3) | 1172 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 164668 | |
| e | 153830 | |
| 69987 | 6.6% | |
| t | 69195 | 6.6% |
| n | 68955 | 6.5% |
| o | 67779 | 6.4% |
| i | 57273 | 5.4% |
| a | 50288 | 4.8% |
| c | 45475 | 4.3% |
| A | 43891 | 4.2% |
| Other values (17) | 262497 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 843357 | |
| Uppercase Letter | 140354 | 13.3% |
| Space Separator | 69987 | 6.6% |
| Other Punctuation | 140 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 164668 | |
| e | 153830 | |
| t | 69195 | |
| n | 68955 | |
| o | 67779 | |
| i | 57273 | 6.8% |
| a | 50288 | 6.0% |
| c | 45475 | 5.4% |
| m | 39112 | 4.6% |
| h | 37936 | 4.5% |
| Other values (7) | 88846 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 43891 | |
| N | 37936 | |
| E | 29081 | |
| W | 21255 | |
| R | 5191 | 3.7% |
| L | 1176 | 0.8% |
| M | 1032 | 0.7% |
| P | 792 | 0.6% |
Space Separator
| Value | Count | Frequency (%) |
| 69987 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 140 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 983711 | |
| Common | 70127 | 6.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 164668 | |
| e | 153830 | |
| t | 69195 | 7.0% |
| n | 68955 | 7.0% |
| o | 67779 | 6.9% |
| i | 57273 | 5.8% |
| a | 50288 | 5.1% |
| c | 45475 | 4.6% |
| A | 43891 | 4.5% |
| m | 39112 | 4.0% |
| Other values (15) | 223245 |
Common
| Value | Count | Frequency (%) |
| 69987 | ||
| / | 140 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1053838 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 164668 | |
| e | 153830 | |
| 69987 | 6.6% | |
| t | 69195 | 6.6% |
| n | 68955 | 6.5% |
| o | 67779 | 6.4% |
| i | 57273 | 5.4% |
| a | 50288 | 4.8% |
| c | 45475 | 4.3% |
| A | 43891 | 4.2% |
| Other values (17) | 262497 |
title
Categorical
HIGH CARDINALITY UNIFORM
| Distinct | 68293 |
|---|---|
| Distinct (%) | 97.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| Optics InfoBase Conference Papers | 51 |
|---|---|
| 22nd International Congress on Sound and Vibration, ICSV 2015 | 11 |
| 41st EPS Conference on Plasma Physics, EPS 2014 | 10 |
| Proceedings of the ASME Turbo Expo | 9 |
| IEEE Workshop on Applications of Signal Processing to Audio and Acoustics | 7 |
| Other values (68288) |
Length
| Max length | 444 |
|---|---|
| Median length | 248 |
| Mean length | 62.681775 |
| Min length | 2 |
Characters and Unicode
| Total characters | 4401953 |
|---|---|
| Distinct characters | 87 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 66867 ? |
|---|---|
| Unique (%) | 95.2% |
Sample
| 1st row | Journal of Technology in Counseling |
|---|---|
| 2nd row | Journal of the Experimental Analysis of Behavior |
| 3rd row | Journal of the History of the Behavioral Sciences |
| 4th row | Journal of Trauma and Dissociation |
| 5th row | Journal of Traumatic Stress |
Common Values
| Value | Count | Frequency (%) |
| Optics InfoBase Conference Papers | 51 | 0.1% |
| 22nd International Congress on Sound and Vibration, ICSV 2015 | 11 | < 0.1% |
| 41st EPS Conference on Plasma Physics, EPS 2014 | 10 | < 0.1% |
| Proceedings of the ASME Turbo Expo | 9 | < 0.1% |
| IEEE Workshop on Applications of Signal Processing to Audio and Acoustics | 7 | < 0.1% |
| National Radio Science Conference, NRSC, Proceedings | 7 | < 0.1% |
| 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014 | 7 | < 0.1% |
| Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM | 7 | < 0.1% |
| DOLAP: Proceedings of the ACM International Workshop on Data Warehousing and OLAP | 6 | < 0.1% |
| Proceedings of the Electronic Packaging Technology Conference, EPTC | 6 | < 0.1% |
| Other values (68283) | 70106 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| and | 31436 | 5.4% |
| of | 27668 | 4.7% |
| on | 23799 | 4.1% |
| international | 22208 | 3.8% |
| conference | 21554 | 3.7% |
| 18233 | 3.1% | |
| proceedings | 18050 | 3.1% |
| the | 14273 | 2.4% |
| journal | 10937 | 1.9% |
| in | 7122 | 1.2% |
| Other values (31286) | 391802 |
Most occurring characters
| Value | Count | Frequency (%) |
| 516927 | 11.7% | |
| n | 388063 | 8.8% |
| e | 365186 | 8.3% |
| o | 304133 | 6.9% |
| i | 260449 | 5.9% |
| a | 241835 | 5.5% |
| t | 223534 | 5.1% |
| r | 210511 | 4.8% |
| s | 148099 | 3.4% |
| c | 147729 | 3.4% |
| Other values (77) | 1595487 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3044695 | |
| Uppercase Letter | 556104 | 12.6% |
| Space Separator | 516927 | 11.7% |
| Decimal Number | 208172 | 4.7% |
| Other Punctuation | 47795 | 1.1% |
| Dash Punctuation | 24581 | 0.6% |
| Open Punctuation | 1757 | < 0.1% |
| Close Punctuation | 1752 | < 0.1% |
| Math Symbol | 152 | < 0.1% |
| Connector Punctuation | 18 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 388063 | |
| e | 365186 | |
| o | 304133 | |
| i | 260449 | |
| a | 241835 | 7.9% |
| t | 223534 | 7.3% |
| r | 210511 | 6.9% |
| s | 148099 | 4.9% |
| c | 147729 | 4.9% |
| l | 129958 | 4.3% |
| Other values (16) | 625198 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 74882 | |
| I | 67952 | |
| S | 60891 | |
| E | 59521 | |
| P | 46901 | |
| A | 43460 | 7.8% |
| M | 32939 | 5.9% |
| T | 24405 | 4.4% |
| R | 18317 | 3.3% |
| D | 15284 | 2.7% |
| Other values (16) | 111552 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 33321 | |
| : | 5506 | 11.5% |
| ' | 2817 | 5.9% |
| / | 2603 | 5.4% |
| . | 1865 | 3.9% |
| ; | 761 | 1.6% |
| & | 382 | 0.8% |
| " | 268 | 0.6% |
| # | 176 | 0.4% |
| ? | 30 | 0.1% |
| Other values (4) | 66 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 60981 | |
| 2 | 52105 | |
| 1 | 45042 | |
| 5 | 7477 | 3.6% |
| 9 | 7409 | 3.6% |
| 4 | 7390 | 3.5% |
| 8 | 7300 | 3.5% |
| 3 | 7183 | 3.5% |
| 6 | 6989 | 3.4% |
| 7 | 6296 | 3.0% |
Math Symbol
| Value | Count | Frequency (%) |
| = | 103 | |
| + | 48 | |
| | | 1 | 0.7% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 24580 | |
| – | 1 | < 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1735 | |
| [ | 22 | 1.3% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1730 | |
| ] | 22 | 1.3% |
Space Separator
| Value | Count | Frequency (%) |
| 516927 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 18 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3600799 | |
| Common | 801154 | 18.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 388063 | 10.8% |
| e | 365186 | 10.1% |
| o | 304133 | 8.4% |
| i | 260449 | 7.2% |
| a | 241835 | 6.7% |
| t | 223534 | 6.2% |
| r | 210511 | 5.8% |
| s | 148099 | 4.1% |
| c | 147729 | 4.1% |
| l | 129958 | 3.6% |
| Other values (42) | 1181302 |
Common
| Value | Count | Frequency (%) |
| 516927 | ||
| 0 | 60981 | 7.6% |
| 2 | 52105 | 6.5% |
| 1 | 45042 | 5.6% |
| , | 33321 | 4.2% |
| - | 24580 | 3.1% |
| 5 | 7477 | 0.9% |
| 9 | 7409 | 0.9% |
| 4 | 7390 | 0.9% |
| 8 | 7300 | 0.9% |
| Other values (25) | 38622 | 4.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4401952 | |
| Punctuation | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 516927 | 11.7% | |
| n | 388063 | 8.8% |
| e | 365186 | 8.3% |
| o | 304133 | 6.9% |
| i | 260449 | 5.9% |
| a | 241835 | 5.5% |
| t | 223534 | 5.1% |
| r | 210511 | 4.8% |
| s | 148099 | 3.4% |
| c | 147729 | 3.4% |
| Other values (76) | 1595486 |
Punctuation
| Value | Count | Frequency (%) |
| – | 1 |
type
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 548.8 KiB |
| journal | |
|---|---|
| conference and proceedings | |
| book series | 1462 |
| trade journal | 789 |
Length
| Max length | 26 |
|---|---|
| Median length | 13 |
| Mean length | 16.253378 |
| Min length | 7 |
Characters and Unicode
| Total characters | 1141426 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | journal |
|---|---|
| 2nd row | journal |
| 3rd row | journal |
| 4th row | journal |
| 5th row | journal |
Common Values
| Value | Count | Frequency (%) |
| journal | 34331 | |
| conference and proceedings | 33645 | |
| book series | 1462 | 2.1% |
| trade journal | 789 | 1.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| journal | 35120 | |
| conference | 33645 | |
| and | 33645 | |
| proceedings | 33645 | |
| book | 1462 | 1.0% |
| series | 1462 | 1.0% |
| trade | 789 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 171938 | |
| n | 169700 | |
| o | 105334 | |
| r | 104661 | |
| c | 100935 | |
| a | 69554 | 6.1% |
| 69541 | 6.1% | |
| d | 68079 | 6.0% |
| s | 36569 | 3.2% |
| j | 35120 | 3.1% |
| Other values (9) | 209995 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1071885 | |
| Space Separator | 69541 | 6.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 171938 | |
| n | 169700 | |
| o | 105334 | |
| r | 104661 | |
| c | 100935 | |
| a | 69554 | 6.5% |
| d | 68079 | 6.4% |
| s | 36569 | 3.4% |
| j | 35120 | 3.3% |
| l | 35120 | 3.3% |
| Other values (8) | 174875 |
Space Separator
| Value | Count | Frequency (%) |
| 69541 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1071885 | |
| Common | 69541 | 6.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 171938 | |
| n | 169700 | |
| o | 105334 | |
| r | 104661 | |
| c | 100935 | |
| a | 69554 | 6.5% |
| d | 68079 | 6.4% |
| s | 36569 | 3.4% |
| j | 35120 | 3.3% |
| l | 35120 | 3.3% |
| Other values (8) | 174875 |
Common
| Value | Count | Frequency (%) |
| 69541 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1141426 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 171938 | |
| n | 169700 | |
| o | 105334 | |
| r | 104661 | |
| c | 100935 | |
| a | 69554 | 6.1% |
| 69541 | 6.1% | |
| d | 68079 | 6.0% |
| s | 36569 | 3.2% |
| j | 35120 | 3.1% |
| Other values (9) | 209995 |
| sourceid | region | type | |
|---|---|---|---|
| sourceid | 1.000 | 0.090 | 0.315 |
| region | 0.090 | 1.000 | 0.337 |
| type | 0.315 | 0.337 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| sourceid | country | coverage | issn | publisher | region | title | type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 12000 | United States | 1999-2003, 2005, 2008 | 15276228 | Columbus State University | Northern America | Journal of Technology in Counseling | journal |
| 1 | 12001 | United States | 1958-2021 | 00225002, 19383711 | Wiley-Blackwell | Northern America | Journal of the Experimental Analysis of Behavior | journal |
| 2 | 12002 | United States | 1969-2021 | 00225061, 15206696 | John Wiley & Sons Inc. | Northern America | Journal of the History of the Behavioral Sciences | journal |
| 3 | 12004 | United States | 2000-2021 | 15299740, 15299732 | Routledge | Northern America | Journal of Trauma and Dissociation | journal |
| 4 | 12005 | United States | 1988-2021 | 15736598, 08949867 | Wiley-Blackwell | Northern America | Journal of Traumatic Stress | journal |
| 5 | 12006 | United States | 1971-2021 | 10959084, 00018791 | Academic Press Inc. | Northern America | Journal of Vocational Behavior | journal |
| 6 | 12007 | Hungary | 1946, 1948, 1977-1999 | 00390690 | Kozponti Statisztikai Hivatal | Eastern Europe | Statisztikai Szemle | journal |
| 7 | 12008 | Hungary | 1980, 1982-1983, 1985, 2016-2021 | 20648251, 00187828 | Hungarian Central Statistical Office | Eastern Europe | Teruleti Statisztika | journal |
| 8 | 12009 | Germany | 2000-2018 | 09426051 | J.C. Cotta'sche Buchhandlung Nachvolger GmbH | Western Europe | Kinderanalyse (discontinued) | journal |
| 9 | 12010 | United States | 1950-1958, 1960-1963, 1965-2021 | 00664308, 15452085 | Annual Reviews Inc. | Northern America | Annual Review of Psychology | journal |
| sourceid | country | coverage | issn | publisher | region | title | type | |
|---|---|---|---|---|---|---|---|---|
| 70217 | 21101058963 | United States | 2016-2021 | 20597991 | SAGE Publications Inc. | Northern America | Methodological Innovations | journal |
| 70218 | 21101058966 | Denmark | 2021 | 22468498 | Aalborg University Press | Western Europe | Journal of Somaesthetics | journal |
| 70219 | 21101059010 | Netherlands | 2021 | 25424246, 25424238 | Brill Academic Publishers | Western Europe | International Journal of Asian Christianity | journal |
| 70220 | 21101059012 | Germany | 2021 | 25693263 | Walter de Gruyter GmbH | Western Europe | Chemistry Teacher International | journal |
| 70221 | 21101059299 | United States | 2014-2021 | 23482451, 23220058 | SAGE Publications Inc. | Northern America | Asian Journal of Legal Education | journal |
| 70222 | 21101059300 | Ukraine | 2021 | 20753829, 20753810 | V. N. Karazin Kharkiv National University | Eastern Europe | Biophysical Bulletin | journal |
| 70223 | 21101059489 | United States | 2021 | 25735985 | EnPress Publisher, LLC | Northern America | Trends in Immunotherapy | journal |
| 70224 | 21101059784 | China | 2020-2021 | 20961146 | Chinese Academy of Sciences | Asiatic Region | Journal of Cyber Security | journal |
| 70225 | 21101059785 | Thailand | 2015-2021 | 24523151 | Kasetsart University Research and Development Institute | Asiatic Region | Kasetsart Journal of Social Sciences | journal |
| 70226 | 21101059786 | United States | 2019-2020 | 15297470, 15336239 | Brookings Institution Press | Northern America | Economia | journal |