Wikidata:WikiProject Heritage institutions/Data structure/Data modelling issues

From Wikidata
Jump to navigation Jump to search

This page supports the tackling of data modelling issues related to the description of heritage institutions on Wikidata.

This Google Doc contains a tentative list of data modelling issues related to heritage institutions. Please help complementing it and/or contribute to the description of issues on this wiki page!

Cases of Problematic Data Modelling[edit]

"Organization", "building", or "collection"?[edit]

Many Wikidata items currently confound organizations, buildings and/or collections. In most cases, this mix-up has been caused by automatic data imports from Wikipedia where it is common practice to cover these different concepts in one and the same article. On Wikidata, this causes problems as soon as statements are added to these items that may apply to either of these concepts, such as inception (P571). Most museums, for example, do have different dates for the construction of the building, the initialization of the collection, and the creation of the organization. If such statements are added indiscriminately, it remains unclear to which entity they apply, and the information becomes difficult to interpret if not useless altogether.

Proposed Solution[edit]

Keep items for "organization", "building", and "collection" strictly apart by assigning them to the respective class (see table below).

Classes
Network Organization Organization Building Part of a Building, Room Collection
museum n/a museum (Q33506) museum building (Q24699794) n/a museum collection (Q27699276)
archive(s) n/a archive (Q166118) archive building (Q635719) n/a archival collection (Q9388534)
library library network (Q28324850) library (Q7075) library building (Q856584) library (Q29843656) library collection (Q856592)

Complications[edit]

Some complications arise from the fact that some classes within the class tree confound these different concepts (see the section "Inconsistencies in Hierarchical Class Tree" below). Thus, some of the data modelling issues can be addressed at the level of individual items, while others need to be resolved at the level of the class tree.

In an effort to facilitate the correct assignment of items to their respective classes, mismatches between the various language versions of the class definitions need to be addressed (see the section "Translation & Internationalization Issues" below).

Problematic Items[edit]

The table below lists overview tables with country statistics for problematic items as well as maintenance queries that help pinpointing problematic items. Note that the lists do not discriminate between cases where the assignment to different classes occurred at the level of the item and cases where the confusion stems from a badly constructed class tree.

Data Modelling Issues: Confounded Classes
Type of Issue Dashboard Maintenance Queries
Museums (= organizations) at the same time defined as architectural structures Country statistics List of problematic items (example query: Switzerland)
Archives (= organizations) at the same time defined as architectural structures Country statistics List of problematic items (example query: Switzerland)
Libraries (= organizations) at the same time defined as architectural structures Country statistics List of problematic items (example query: Switzerland)
Museums (= organizations) at the same time defined as collections Country statistics List of problematic items (example query: Switzerland)
Archives (= organizations) at the same time defined as collections Country statistics List of problematic items (example query: Switzerland)
Libraries (= organizations) at the same time defined as collections Country statistics List of problematic items (example query: Switzerland)

Further problematic items may be found by searching specifically for "organization" items with statements that are typical for architectural structures (e.g. instances of museum (Q33506) with architect (P84) items). Also, certain identifiers typically refer to either an "organization" or a "building".

Data Modelling Issues: Mismatched Properties 1
Type of issue Dashboard architect (P84) architectural style (P149) occupant (P466)
Museum (= organization) items with statements that are typical for architectural structures Country statistics Query Query Query
Archive (= organization) items with statements that are typical for architectural structures Country statistics Query Query Query
Library (= organization) items with statements that are typical for architectural structures Country statistics Query Query Query

Note that the percentage indications provided by the property dashboard for the different properties does not match the figures obtained from the maintenance queries.

Data Modelling Issues: Mismatched Properties 2
Type of issue Dashboard maintained by (P126) copyright status (P6216) level of description (P6224)
Museum (= organization) items with statements that are typical for collections
Archive (= organization) items with statements that are typical for collections
Library (= organization) items with statements that are typical for collections


[TO DO: complement the tables and dashboards with further properties and with further links to maintenance queries that pinpoint mismatching properties]

Progress Statistics[edit]

The tables below are used to track the progress that is being made in cleaning up the respective items and hierarchical class tree.

As the percentage indications provided by the property dashboard for the different properties do not match the figures obtained from the maintenance queries, the absolute numbers obtained from the maintenance queries are used.

Statistics: Problematic Museum Items
Date museum (N) museum & architectural structure (N) museum & collection (N) architect (P84) (N) architectural style (P149) (N) occupant (P466) (N)
2019-09-29 45808 (server error) 43794 1109 860 137
2019-10-06 45919 (server error) 43903 1109 863 139
2019-10-14 45989 (server error) 453 1109 864 138
2019-10-30 46304 (server error) 630 1117 911 141
Statistics: Problematic Archives Items
Date archive (N) archive & architectural structure (N) archive & collection (N) architect (P84) (N) architectural style (P149) (N) occupant (P466) (N)
2019-09-29 5583 181 5348 32 17 3
2019-10-06 5590 183 5354 32 17 3
2019-10-14 5618 184 107 33 17 3
2019-10-30 5618 186 400 33 17 3
Statistics: Problematic Library Items
Date library (N) library & architectural structure (N) library & collection (N) architect (P84) (N) architectural style (P149) (N) occupant (P466) (N)
2019-09-29 37477 3632 34352 360 252 26
2019-10-06 56716 10969 51089 360 252 26
2019-10-14 72047 1785 191 359 253 27
2019-10-30 73565 1612 57949 365 256 27


[TO DO: complement the statistics tables]

Property Constraints[edit]

Eventually, property constraints should be used to flag problematic statements. Furthermore, we should be looking into using Shape Expressions for data validation purposes (cf. EntitySchema:E125; EntitySchema:E90).

[TO DO: insert table with proposed property constraints and a column to track their state of implementation.]

[TO DO: review/complement the instructions on the "Data structure" page to reflect the rules defined here.]

Translation & Internationalization Issues[edit]

"Organization", "building", or "collection"?[edit]

There are various translation issues related to the distinction of organizations, buildings, and collections. For example, as of summer 2019, library (Q7075) (library) was defined as "collection" in English, as "institution" in Spanish, and as a "place" in French.

Item Concerned Description of Issue Proposed Solution Implementation Status
library (Q7075) Varying definitions across languages: "collection of resources" (en), "facility" (de), "institution" (es, it, nl), "place" (fr), book depository (ru) Align all language versions with the following definition: "institution charged with the care of a collection of literary, musical, artistic, or reference materials, such as books, manuscripts, recordings, or films". corrected en, fr, and de; removed some obviously wrong definitions in other language versions
institution (Q178706) vs. institution (Q3917196) In many languages, the term "institution" is polysemic, referring to 1. an established law, practice, or custom; 2. an organization. Define institution (Q178706) as 1. (an established law, practice, or custom); and use institution (Q3917196) to refer to 2. (an organization).


[TO DO: List further translation issues and proposed solutions.]

Inconsistencies in Hierarchical Class Trees[edit]

"Organization", "building", or "collection"?[edit]

The mix-up between organizations, buildings, and collections can also be found within the hierarchical class tree and needs to be sorted out there as well.

Item Concerned Description of Issue Proposed Solution Implementation Status Comments
library (Q7075) subclass of "storage" remove statement ✓ Done Some mismatched identifiers were corrected as well; yet, they haven't been verified systematically.

Added different from (P1889) statements to distinguish the item from items it is easily confounded with.

Added maintained by WikiProject (P6104)WikiProject Heritage institutions (Q69901156)

library (Q7075) subclass of "collection" remove statement ✓ Done
library (Q7075) subclass of "facility" (= place) remove statement ✓ Done
main library (Q12317349) subclass of "library building" - library building (Q856584) remove statement ✓ Done
library branch (Q11396180) subclass of "library building" - library building (Q856584) remove statement ✓ Done
GLAM (Q1030034) defined both as subclass of cultural institution (Q5193377) (= organization) and collection (Q2668072) remove statement <sub-class of> collection ✓ Done Added maintained by WikiProject (P6104)WikiProject Heritage institutions (Q69901156)

[TO DO: List further inconsistencies and proposed solutions.]