[Planetlab-devel] inter-plc federation - email uniqueness - proposed patch & discussion

Thierry Parmentelat Thierry.Parmentelat at sophia.inria.fr
Mon Jun 11 10:32:48 EDT 2007


Hi Tony

I think I have a working version for this email-namespace problem
Please find below the patch I am suggesting -- I have not mentioned the 
change in Nodes.py (about validate_last_contact) since I think you 
already merged it in the princeton codebase.

The only additional trick is to artificially update the instances of 
persons as returned from GetPeerData. More specifically, GetPeerData 
returns person dicts where peer_id is set to None (since we've asked for 
the peering end's local persons), so I just need to set it to the local 
peer_id that depicts the peering end, and use the idea that I described 
last week, about having validate_email check for duplicates on the 
candidate person's peer_id only.

==
As I said I am now going to review GetPeerData, and try and narrow the 
scope of the fields that get cached for federation. For instance with 
the current code, and since the introduction of last_contact, I have 
almost all nodes that get refreshed during two successive calls to 
RefreshPeer; I suspect this rather unfriendly behaviour to be due to 
last_contact being changed frequently (how often is that supposed to be, 
btw ?)
I hope this review can also drastically improve the overall performance 
of RefreshPeer -- at least by reducing the size of the xml traffic -- 
but to that end I am back to a decent federation test setup so this will 
take a while.
So more on this later.
==

I've been thinking about the possible implications of this change on the 
web pages, but as far as I can tell at this stage, there should not be 
any in fact.

-- Thierry

$ svn diff
Index: PLC/Persons.py
===================================================================
--- PLC/Persons.py      (revision 492)
+++ PLC/Persons.py      (working copy)
@@ -100,7 +100,13 @@
         if len(domain) < 2:
             raise invalid_email
 
-        conflicts = Persons(self.api, [email])
+       # Thierry - June 11 2007
+       # check only against users on the same peer
+       if 'peer_id' in self:
+           namespace_peer_id = self['peer_id']
+       else:
+           namespace_peer_id = None
+        conflicts = Persons(self.api, 
{'email':email,'peer_id':namespace_peer_id})
         for person in conflicts:
             if 'person_id' not in self or self['person_id'] != 
person['person_id']:
                 raise PLCInvalidArgument, "E-mail address already in use"
Index: PLC/Methods/RefreshPeer.py
===================================================================
--- PLC/Methods/RefreshPeer.py  (revision 492)
+++ PLC/Methods/RefreshPeer.py  (working copy)
@@ -228,6 +228,10 @@
 
         # Keyed on foreign person_id
         old_peer_persons = Persons(self.api, {'peer_id': peer_id}, 
columns).dict('peer_person_id')
+       # artificially attach the persons returned by GetPeerData to the 
new peer
+       # this is because validate_email needs peer_id to be correct 
when checking for duplicates
+       for person in peer_tables['Persons']:
+           person['peer_id']=peer_id
         persons_at_peer = dict([(peer_person['person_id'], peer_person) \
                                 for peer_person in peer_tables['Persons']])
 


Tony Mack wrote:
> Hi Thierry,
>
> It seems this issue is a little tricky, but I think you are headed in 
> the right direction. I will be bringing up an alternate plc soon, 
> which will act as a test environment for new development. We can use 
> this alternate plc to help us test these federation issues.
>
> BTW, I will update Nodes.py to validate last_contact, thanks for 
> pointing this out.
>
> -- Tony
>
> Thierry Parmentelat wrote:
>> Tony
>> I did some progress on this topic:
>> - the patch I sent does not work as is because as I suspected there 
>> are cases where 'peer_id' is not provided at all to validate_email
>> - the protected version where we filter against peer_id when it is 
>> provided, or against peer_id=None when it's not, is half good. With 
>> this version I can locally add an e-mail address that is already 
>> present from another peer
>> - but I cannot successfully cache a remote email that would have 
>> already locally registered. This is much more tricky because 
>> RefreshPeer turns out to call sync on a Person object that has no 
>> peer_id set yet.
>>
>> So please do not waste too much time at this stage on 'how' to 
>> implement this, I'll have to spend more time on this next week.
>> But I'd still like your inputs on how this should be tested, because 
>> it clearly could have weird side-effects.
>>
>>
>> In the meanwhile, there definitely is something that you might 
>> consider doing on your side, in Nodes.py
>>
>>     validate_date_created = Row.validate_timestamp
>>     validate_last_updated = Row.validate_timestamp
>> +    validate_last_contact = Row.validate_timestamp
>>
>>     def update_last_contact(self, commit = True):
>>
>> which allows the last_contact field that you recently added, to get 
>> correctly refreshed from the integer-date we get through GetpeerData 
>> and xmlrpc.
>>
>> By the way, I also plan on reviewing the set of data that flows 
>> through GetPeerData.
>> It turns out this last_contact field gets cached togother with a lot 
>> more data that is definitely not crucial to federation, and
>> I am pretty sure I can substantially improve the overall performance. 
>> This is another story, but until I do this review you'll need this 
>> patch so RefreshPeer can work on your side.
>>
>> -- Thierry
>>
>>
>> Thierry Parmentelat wrote:
>>> Hi Tony and all
>>>
>>> I am currently trying to address the following issue.
>>> The current status of the api & RefreshPeer thing has a major 
>>> drawback. During the transitional period where European users will 
>>> start switching from the Public PlanetLab at PLC to PlanetLabEurope, 
>>> there may be a need for a user to be registered at both myplc's at 
>>> the same time.
>>> In this particular case, the current software does not allow the 
>>> user to use the same email address on both ends. The reasons for 
>>> this are that
>>> - AddPerson uses validate_email in Persons.py, that in turn checks 
>>> for any Person with that new email address and raises an exception 
>>> if such an entry can be found
>>> - RefreshPeer basically uses the same validate_email code when it 
>>> tries to sync newly created local objects
>>> -  Note however that no hard-wired constraints in the DB is enabled 
>>> here
>>>
>>>
>>> I am currently validating the following patch in PLC/Persons.py
>>>         if len(domain) < 2:
>>>             raise invalid_email
>>>
>>> -        conflicts = Persons(self.api, [email])
>>> +       # Thierry - June 8 2007
>>> +       # check only against users on the same peer
>>> +       # xxx should we protect against cases where self does not 
>>> hold a 'peer_id' key ?
>>> +        conflicts = Persons(self.api, 
>>> {'email':email,'peer_id':self['peer_id']})
>>>         for person in conflicts:
>>>             if 'person_id' not in self or self['person_id'] != 
>>> person['person_id']:
>>>
>>> As you see I just alter the uniqueness check by relaxing the 
>>> constraint, and prevent email duplication on a given peer only.
>>>
>>> I am having a hard time setting up a reasonable test environment to 
>>> check this feature as thoroughly as possible, so I cannot commit on 
>>> its correctness yet.
>>> I would appreciate to know your feelings about it anyway, that might 
>>> save me some time especially if this is a wrong track. So:
>>> - first I'd like to know whether you feel comfortable with that 
>>> patch and hear your comments if any. In particular do you think 
>>> there might be cases where validate_email can be called on a 'self' 
>>> that would not hold a 'peer_id' filed ?
>>> - and second, during my test so far I've noticed that this same 
>>> validate_email thing was also called when performing deletions. I 
>>> assume this applies to all types of objects. My feeling is that this 
>>> sounds awkward - for instance if one changes the validation for a 
>>> given entity to make it more strict, then injecting the new code 
>>> will prevent us from cleanly deleting entities that do not fulfill 
>>> the validation criteria - which are precisely the ones we'd like to 
>>> delete. This is of course much independant from the central issue I 
>>> am addressing here, but your comments about this would be 
>>> appreciated too.
>>>
>>>
>>> Thanks -- Thierry
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel



More information about the Devel mailing list