Please add here your suggestions of new variables not yet included in the
draft specs
Some useful data for developer should be collected, but should only be presented in aggregated anonymous form.
- Programming language (eg. PHP)
- Programming language version (eg. 5.2.3)
- Webserver (eg. Apache)
- Webserver (eg. 2.0.3-prefork)
- Operating System (eg. Linux, Windows Vista)
- I agree it would be very useful for devs to track the distribution of versions, environments etc., Maybe I can add this to the next minor revision of the draft -- DarTar
Some more page data might be useful
- biggest page in bytes
- smallest page in bytes
- size of all pages in bytes
- biggest media file in bytes
- smallest media file in bytes
- size of all mediafiles in bytes
Andi writes:
- Namespaces are unlimited and can be nested. How to count them?
- The same problem as far as I see applies to categories (they can also be nested and unlimited). As a first approximation (to assess the complexity of a wiki) we could just retrieve the raw sum of available nodes (regardless of their level) - would that be computationally demanding in doku? I agree that we need better indicators of wiki structure. If categories and namespaces can be described as oriented graphs then there's a number of indicators we may want to extract to compare different wikis, but I'm not sure how far this can be generalized to all wikis. -- DarTar
Camille
Distributions of edits per page and edits per users (that's an activity profile, so it should be related to overall activity perspectives), would already be good
computing the distributions of edits per user/page shouldn't be hard to do *provided that* they are updated for every edit somewhere in the page history and user account
Dario
andi notes: "Counting total numbers (= of edits) might be very ressource intensive, we could provide edits per day"
Camille
when you ask for the revision history in
MediaWiki, the list of revisions is immediately available, so should its number, right? (same for users & contributions)
Dario
well the problem is to get the whole distribution *per page* or *per user* in cases with large userbases or huge amounts of content
Felipe Ortega (
WikiXRay) suggested that it would be great to have less crude indicators, but those would need to be precomputed and stored in a DB to which WT should have access
the problem is, this is a major barrier against adoption (compared to the "plugin-drop-in-and-register" approach)
Camille
mmm I agree, the external DB model would be way too tough for adoption.
let's assume the whole distribution per page/per user is too large, ok, so what about just updates:
each day, counters are incremented for each user who is doing mods/each page that is getting modified, this is sent to WT in a compressed format: the wiki platform would append edit information in a separate file and then send daily this file in a compressed format to WT, eventually erasing the file daily
the file would be like: each edit is a double word (user ID or page ID), and it's the simple list of all edits which is transmitted to WT
so for 1 million edits, that's about 4MB before compression
Dario
yeah that sounds sensible, but again, it requires something more than just a plugin that can generate content on the fly, right? this would need structural changes
Camille
in 6 yrs on Wikipedia there's been 250M edits
let's assume presently the rate is approximately equivalent to half of this figure, that's about 125M edits per yr, that is, 300k edits per day.
that makes a 1.2MB file, let's say 1MB compressed for the English wikipedia: sounds ok!
Dario
yes, I'm not concerned about size, rather than structural changes needed to generate that
but maybe I'm overestimating the problem
Camille
i agree with your fears, but i think even having these distributions requires no more than adding a line in the edition saving process that dumps the ID of the user + page in a single file, which is to be transmitted over to WT
computationally, this can't be heavy.
-- and we'll do the distribution computation part, as it is now with FT
Dario
certainly not, but for most engines you would need to change the edit form to do this, which is very likely to be core functionality, not a plugin
CategorySpecs