Wikidata:Property proposal/unusualness

From Wikidata
Jump to navigation Jump to search

unusualness[edit]

Originally proposed at Wikidata:Property proposal/Generic

   Not done
Descriptionhow unusual is this item compared to other items of this type
Representsany
Data typeNumber (not available yet)
Domainitem
Allowed values0-number of instances
Allowed unitsnone or "items"
Example 1200
Example 21000
Example 310000
Example 4100000
Planned usefiltering unusual/exotic instances with extra complexitiy that are not worthwile to handle
Expected completenesscomplete for values over 1000 that hurt

Motivation[edit]

https://www.wikidata.org/wiki/Q7395156 is an example of an unusual instance. While most instances of https://www.wikidata.org/wiki/Q47258130 scientific conference series are relating to a single conference series this one is a "pair". This is an attempt to model a real world situation at the cost of extra complexity. To avoid the complexity a means to filter this unusual instance out might be needed. In my usecase today https://confident.dbis.rwth-aachen.de/dblpconf/wikidata with the query: try it!

the above conference series shows up multiple times and has multiple entries for some properties. This increases the complexity of the query and doesn't add enough value for my usecase - I need a means to filter this unusual record. I could do this by filtering the item by its WikiData Q identifier but then I'd have to add any more upcoming unusual/exotic cases later.

Having an unusualness property with "220" as its value in this case pointing out that this is the only such case out of some 220 cases would help to filter by "unusualness". If only the most usual records are wanted that either have no usualness property or its value is below a certain threshold.

There are many other "longtail" situations where a similar strategy might be applicable based on this property. Think of the Pareto Rule where only the most usual 20% give 80% of the value in quite a few scenarios. A proper usualness indicator could filter here.

Which brings up the question whether decile/n-tile properties are already available for setting values like the usualness...

# Conference Series wikidata query
# see https://confident.dbis.rwth-aachen.de/dblpconf/wikidata
# WF 2021-01-30
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?confSeries ?display_name ?confSeriesLabel ?official_website ?DBLP_pid ?WikiCFP_pid ?GND_pid
WHERE 
{
  #  scientific conference series (Q47258130) 
  ?confSeries wdt:P31 wd:Q47258130.
  OPTIONAL { ?confSeries wdt:P1813 ?short_name . }
  BIND (COALESCE(?short_name,?confSeriesLabel) AS ?display_name).
  #  official website (P856) 
  OPTIONAL {
    ?confSeries wdt:P856 ?official_website
  } 
  # any item with a DBLP venue ID 
  OPTIONAL {
    ?confSeries wdt:P8926 ?DBLP_pid.
  }
  # WikiCFP pid 
  optional {
     ?confSeries wdt:P5127 ?WikiCFP_pid.
  }
  # GND pid
  optional {
    ?confSeries wdt:P227 ?GND_pid.
  }
  # label 
  ?confSeries rdfs:label ?confSeriesLabel filter (lang(?confSeriesLabel) = "en").
}
ORDER BY (?display_name)

 – The preceding unsigned comment was added by Seppl2013 (talk • contribs) at 13:27, January 30, 2021‎ (UTC).

Discussion[edit]