Much work on digital repositories has focused on issues surrounding Open Access to research outputs (generally pre- or post-prints). If access to an object is truly open, then it may seem that access management is not an issue. However, we may distinguish 2 cases:
The SP has no interest in who is accessing a resource.
Access to resources is open, but the SP is still interested in capturing information about usage.
In the latter case, FAM becomes an issue. On the one hand, personal information may be used for statistical purposes, rather than tracking individuals; however, information relating to specific individuals may be of interest. Two examples of the latter are: (i) recording usage data for accounting and audit purposes in grid environments such as the NGS (National Grid Service) ; (ii) arXiv, which blocks users indiscriminately downloading material via their IP addresses. Systematic collection of usage data for OA material may also be useful for marketing an IR, providing feedback to users, and development and dissemination of the arguments for OA among researchers and others.
A common theme which arises here is the concept of “user registration”, where users are required to provide information additional to the standard Shibboleth attributes the first time that they access a resource: examples include the SARoNGS (Shibboleth Access to Resources on the NGS) project for the NGS , and the EThOS system, which requires registration for access to Open Access e-theses
In addition, personal information may also be useful for personalisation, as in the GoldDust project. People are willing to trade personal information for benefits so long as this is done openly and up front. If users are required to register and log in then the repository is no longer Open Access, but with the carrot of added functionality, users may be encouraged to register on a voluntary basis. If this is not the case, people are likely to object; indeed, the current federation rules place strict restrictions on the uses that may be made of personal data, as anonymity is a current priority.
A general issue here is that the current federation rules place strict restrictions on how long logs can be kept for and on the uses that may be made of personal data, as anonymity is a current priority. However, there is some flexibility – logs can be kept for longer than the standard 6 months if the IdP agrees – and the rules are open to modification if an appropriate justification can be made .
For some use cases, the limitation on how long logs can be kept is not an issue; the raw information is needed only in the short term, and once the logs have been used to generate statistical information it is not clear that they are needed further . In any case, the logs will be of little use in the long term as the IdP will not maintain in perpetuity the mapping between eduPersonTargetedID and named individuals. On the other hand, if the repository management wants to look at longer-term trends, and the effects triggered by external events (such as declarations and other publicity related to OA), then it is difficult to predict what raw data may be needed for analysis in the future. These considerations are closely linked to recent JISC work on Usage Statistics .
Anonymity can work to the disadvantage of users. For example, users may not always interact with repositories in the most effective way; in such circumstances, the repository managers would be able to help the users if they were able to trace and contact them, but this will not be possible unless the IdP agrees to put the SP in touch with the user . It would be possible to ask users whether they want this sort of feedback the first time that they access the repository, but means that the SP would have to trust them to enter their information correctly, and would take upon itself the burden of maintaining the data – it would be better to push this back onto the IdP where it belongs.
An important theme that arose was ‘registration’ – getting people to provide additional personal information the first time that they access a repository This was recognised as an area of work common to several scenarios, e.g. personalisation, consent management.