Detailed information about the formed corpora is presented in the Table below and includes:
The table shows several different subcorpuses from the general corpus (Version II from 2021 (missing reference)):
Entity type (tag) | Version I | Version II | Balanced subcorpus of the version II | Subcorpus of 500 texts | Version III | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Number of annotations | Number of reviews | Number of annotations | Number of reviews | Average length of the entity (num. of words) | Number of annotations | Number of reviews | Number of annotations | Number of reviews | Number of annotations | Number of reviews | |
Medication | 17875 | 1659 | 33005 | 2799 | -- | 13748 | 1250 | 5967 | 500 | 48075 | 3821 |
Drugname | 4745 | 1655 | 8239 | 2793 | 1.2 | 3503 | 1247 | 1489 | 498 | 11812 | 3815 |
Drugform | 3303 | 1266 | 5997 | 2194 | 1 | 2423 | 960 | 1041 | 387 | 8736 | 3061 |
MedMaker | 954 | 816 | 1720 | 1451 | 1.4 | 750 | 629 | 273 | 228 | 2514 | 2097 |
SourceInfodrug | 1267 | 878 | 2579 | 1579 | 1.7 | 1110 | 683 | 460 | 285 | 3762 | 2278 |
Drugclass | 1786 | 1005 | 3113 | 1684 | 1 | 1317 | 747 | 577 | 313 | 4543 | 2268 |
DrugBrand | 2584 | 1038 | 4656 | 1804 | 1.1 | 2021 | 812 | 873 | 335 | 6786 | 2540 |
Route | 1470 | 817 | 3609 | 1733 | 2.2 | 1440 | 739 | 683 | 317 | 5641 | 2547 |
Duration | 895 | 701 | 1515 | 1194 | 2 | 565 | 463 | 256 | 192 | 1866 | 1470 |
Dosage | 506 | 387 | 960 | 706 | 2.5 | 407 | 313 | 202 | 143 | 1566 | 1060 |
Frequency | 365 | 303 | 617 | 517 | 3.9 | 212 | 187 | 113 | 87 | 849 | 724 |
Disease | 9222 | 1603 | 17403 | 2713 | -- | 6307 | 1180 | 2819 | 478 | 23854 | 3716 |
Diseasename | 2215 | 917 | 4042 | 1628 | 1.2 | 1462 | 657 | 738 | 296 | 4934 | 2096 |
Indication | 2310 | 955 | 4627 | 1783 | 1.7 | 1518 | 670 | 720 | 297 | 7456 | 2631 |
BNE-Pos | 2967 | 1021 | 5620 | 1764 | 2.7 | 1990 | 676 | 809 | 289 | 7475 | 2477 |
NegatedADE | 1532 | 641 | 2804 | 1104 | 3.2 | 1195 | 496 | 481 | 201 | 3600 | 1523 |
Worse* | 83 | 51 | 224 | 134 | 4.6 | 99 | 61 | 52 | 35 | 302 | 190 |
ADE-Neg* | 115 | 68 | 86 | 54 | 4 | 43 | 28 | 19 | 12 | 87 | 55 |
ADR | 843 | 339 | 1778 | 625 | 2.4 | 1752 | 610 | 709 | 177 | 5050 | 1605 |
Note | 2319 | 1004 | 4490 | 1861 | -- | 2273 | 905 | 902 | 359 | 6931 | 2798 |
The corpus contains consumer posts on drugs, mentioned 11 812 times and related to 604 ATC codes. The most popular 20% of the ATC codes (by the number of reviews with corresponding Drugname mentions) include 120 different codes which mentions appears in 3 295 reviews (86% of all reviews). Among them, 22 ATC codes were reviewed in more then 50 posts (2351 posts in total).
The proportions of reviews about domestic drugs and foreign to the total number of reviews are 40.59\% and 45.25\% respectively. The remaining documents (14.16\%) contains mentions of multiple drugs both domestic and foreign or mentions of drugs which origin the annotators could not determine. Among the domestic drugs are following: “Anaferon” (145 reviews), “Viferon” (140), “Ingavirin” (102) and “Glycine” (101). Examples of mentioned foreign drugs: “Aflubin” (93), “Amison” (55), “Antigrippin” (65) and “Immunal” (42).
Regarding diseases, the most frequent ICD-10 top level categories are “X - Diseases of the respiratory system” (1221 reviews); “I - Certain infectious and parasitic diseases” (356 reviews); “V - Mental and behavioural disorders” (227 reviews); “XIX - Injury, poisoning and certain other consequences of external causes” (137 reviews). The top 5 low level codes from the ICD-10 by the number of reviews are presented in Fig.1.
Figure 1. Top 5 low-level disease categories from the ICD-10.
Analysing the consumers’ motivation to acquire and use drugs (“sourceInfoDrug” attribute) showed that review authors mainly mention using drugs based on professional recommendations. 1473 reviews contains references of doctor prescriptions, 341 - refers to pharmaceutical specialists recommendations and 334 - doctor recommendations. Some reviews reports about using drugs recommended by relatives (290 reviews), advertisement (114) or internet (43). The heatmap, presented on Fig.2, shows percentages of reviews where popular drugs were co-occurred with different sources (sources were manually merged into 5 groups by annotators).
Figure 2.The distribution heatmap of reviews percentages for different sources of information for the 20 most popular drugs.
It could be seen that most recommendations are coming from professionals. For example Isoprinosine (used in 65.85% cases by medical prescription), Aflubun (44.09%), Anaferon (42.15%) and others. However, for such drugs as Oxolinum (11.39%) or Aphobazolum (10.00%) the rate of usage on the advice of patients’ acquaintances is close to doctors’ recommendations or higher. Amizon (13.46%) and Kagocel (9.72%) have the highest percentage for mass media (advertisement, internet and other) as the source compared to other drugs.
The distribution of the tonality (positive or negative) for the sources of information is presented in Fig. ref{fig:Distribution_drug_tonality}. A source is marked as “positive” if positive dynamic is appeared after the use of drug (i.e. review includes “BNE-pos” attribute). “Negative” tonality is marked if negative dynamic or deterioration in health has taken place or drug has had no effect (i.e. “Worse”, “ADE-Neg” or “NegatedADE” mentions appear).
% The distribution of the tonality (positive or negative) for the sources of information is presented in Fig.3. A source is marked as “positive” if positive dynamic is appeared after the use of drug (i.e. review includes “BNE-pos” attribute). “Negative” tonality is marked if negative dynamic or deterioration in health has taken place or drug has had no effect (i.e. “Worse”, “ADE-Neg” or “NegatedADE” mentions appear). Reviews with both effects were not taken into account. It follows from the diagram that drugs recommended by doctors or pharmacists are mentioned more often as having positive effect, while using drugs based on an advertisement often leads to deterioration in health.
Figure 3. Tonality, relative to the source of recommendations.
Diagrams in Fig.4 show parts of reviews where popular drugs (top 20) were mentioned along with labeled effects. The following drugs have largest parts for ADR in reviews: immunomodulator – “Isoprinosine” (46.34% of reviews with this drug contains mentions of ADR), antiviral “Amixin” (40.0%), tranquilizer – “Aphobazolum”(40.0%), antiviral – “Amizon” (36.53%), antiviral – “Rimantadine” (33.96%).
Figure 4.Distributions of labels of effects reported by reviewers after using drugs.
Users mention that some drugs causing negative dynamics after start or some period of using it (ADE-Neg). Examples of such drugs are “Anaferon” (2.7% of reviews with this drug mention ADE-Neg effects), “Viferon” (2.1%), “Glycine” (4.0%), “Ergoferon” (3.6%).
According to reviews some of the drugs causes deterioration in health after taking the course (“Worse” label): immunomodulator – “Isoprinosine” (12.2%), antiviral – “Ingavirin” (11.8%), “Ergoferon” (7.3%) and other.