Rank14

Idea#16

This idea is active.
Scenarios »

Scenario 1: Open Access

Much work on digital repositories has focused on issues surrounding Open Access to research outputs (generally pre- or post-prints). If access to an object is truly open, then it may seem that access management is not an issue. However, we may distinguish 2 cases:

 The SP has no interest in who is accessing a resource.

 Access to resources is open, but the SP is still interested in capturing information about usage.

In the latter case, FAM becomes an issue. On the one hand, personal information may be used for statistical purposes, rather than tracking individuals; however, information relating to specific individuals may be of interest. Two examples of the latter are: (i) recording usage data for accounting and audit purposes in grid environments such as the NGS (National Grid Service) ; (ii) arXiv, which blocks users indiscriminately downloading material via their IP addresses. Systematic collection of usage data for OA material may also be useful for marketing an IR, providing feedback to users, and development and dissemination of the arguments for OA among researchers and others.

A common theme which arises here is the concept of “user registration”, where users are required to provide information additional to the standard Shibboleth attributes the first time that they access a resource: examples include the SARoNGS (Shibboleth Access to Resources on the NGS) project for the NGS , and the EThOS system, which requires registration for access to Open Access e-theses

In addition, personal information may also be useful for personalisation, as in the GoldDust project. People are willing to trade personal information for benefits so long as this is done openly and up front. If users are required to register and log in then the repository is no longer Open Access, but with the carrot of added functionality, users may be encouraged to register on a voluntary basis. If this is not the case, people are likely to object; indeed, the current federation rules place strict restrictions on the uses that may be made of personal data, as anonymity is a current priority.

A general issue here is that the current federation rules place strict restrictions on how long logs can be kept for and on the uses that may be made of personal data, as anonymity is a current priority. However, there is some flexibility – logs can be kept for longer than the standard 6 months if the IdP agrees – and the rules are open to modification if an appropriate justification can be made .

For some use cases, the limitation on how long logs can be kept is not an issue; the raw information is needed only in the short term, and once the logs have been used to generate statistical information it is not clear that they are needed further . In any case, the logs will be of little use in the long term as the IdP will not maintain in perpetuity the mapping between eduPersonTargetedID and named individuals. On the other hand, if the repository management wants to look at longer-term trends, and the effects triggered by external events (such as declarations and other publicity related to OA), then it is difficult to predict what raw data may be needed for analysis in the future. These considerations are closely linked to recent JISC work on Usage Statistics .

Anonymity can work to the disadvantage of users. For example, users may not always interact with repositories in the most effective way; in such circumstances, the repository managers would be able to help the users if they were able to trace and contact them, but this will not be possible unless the IdP agrees to put the SP in touch with the user . It would be possible to ask users whether they want this sort of feedback the first time that they access the repository, but means that the SP would have to trust them to enter their information correctly, and would take upon itself the burden of maintaining the data – it would be better to push this back onto the IdP where it belongs.

Proposed action:

An important theme that arose was ‘registration’ – getting people to provide additional personal information the first time that they access a repository This was recognised as an area of work common to several scenarios, e.g. personalisation, consent management.

Comment

Submitted by Neil Jacobs 5 years ago

Vote Activity

  1. Disagreed
    5 years ago
  2. Disagreed
    5 years ago

Comments (5)

  1. Not a priority for the OA use case...

    5 years ago
  2. There seem to me to be various different 'use cases' included in this scenario which makes it difficult to offer a straight +ve or -ve opinion. It would seem useful for users to have easy way to register where this would be beneficial to them as users e.g to use repositories where registration is required (Ethos, Sarongs)or to get additional personalised functionality. On the other hand for institutional or other repositories to request registration in order to gather usage stats seems unreasonable. Surely stats should be at a more generic level??

    The scenario seems to focus on the stats use case do I am giving -ve vote.

    5 years ago
  3. I agree with Rachel here. It would make sense to separate this out into several different cases, and the log file retention is only a minor issue and not really relevant to OA. Where information is provided at registration, the important thing is to ensure that it is clear to the user why the repository is asking for this, and whether it is possible to register without giving some of it (email addresses are probably the most usual instance). One of the findings of the FLAME project is that a substantial proportion of users have given false information on registration to websites, because they don't want to give out correct contact details, and it would be possible to use FAM to give verified data from an institutional IdP. (If combined with a user's ability to determine what information is released to an IdP, this would remove the need for registration entirely, as the user could choose on a session by session basis to reveal the information needed for personalisation.)

    What I think is also important is that OA usually still leaves at least some roles needing access control, possibly including depositors, editorial roles, and administrator roles. It could perhaps be viewed as a special case of the other scenarios, where the read access permissions are liberal.

    5 years ago
  4. The discussion of log files, however, has made me go back to check exactly what the UK Federation recommends - which has changed fairly recently (Nov 2008). (Note that everything said about logging is recommended, not required and certainly not "strict restrictions": I know of HEIs and SPs which have decided that they do not wish to abide by all these recommendations and informed the federation that this is the case.)

    I don't think it's clear from the original post that there is a distinction between Service Providers and the repository itself. The meaning of SP in the rules and recommendations is defined in the rules, http://www.ukfederation.org.uk/library/uploads/Documents/rules-of-membership.pdf, as "any Member who grants access to End Users to services or resources made available by that Member", which certainly to my mind means that they are separate. So the log files being referred to are not the repository logs, but the SP software logs. The recommendations document, http://www.ukfederation.org.uk/library/uploads/Documents/recommendations-for-use-of-personal-data.pdf, is not always entirely clear about the distinction between the body which owns the SP and the SP software itself, however, so I might not be correct here: the recommendations could be taken to apply to any log file owned by the SP-as-organisation, in which case it would apply to the repository software logs as well as the SP-as-software logs. The first paragraph of Section 4.2 seems to restrict the logfiles being discussed to the SP-as-software logs, which is good, as these logs should only "the identifier associated with the subject of the Shibboleth SAML (Security Assertion Markup Language) assertion, not any other information purporting to identify the user" - if this applied to repository log files, it would require removal of such things as information linking the session to a user record in the repository database, for example.

    The six months limit is in the context of "fault-finding and tracing misuse"; the recommendations then go on to say that "accounting and other purposes may justify longer retention but consideration should be given to removing personal data from the logs if there is no need to account for activity of individual users". So longer retention is allowed even for SPs which follow the recommendations for accounting, though they should be deleted once the analysis has been carried out.

    5 years ago
  5. Federated Access Management is entirely orthogonal to this scenario.

    Usage tracking can use any number of methods, and one reliant on FAM has a number of drawbacks.

    Likewise personalisation can be handled in different ways.

    The primary benefit of FAM relates to institutional trust, which is irrelevant here.

    5 years ago