[Planetlab-devel] inter-plc federation - email uniqueness - proposed patch & discussion

Tony Mack tmack at CS.Princeton.EDU
Thu Jun 14 11:52:13 EDT 2007


I will update our code with your patch soon.

Thierry Parmentelat wrote:
> Following up on this topic:
> I have been testing this change on both a real deployment -- the one 
> that runs the peering trial with the public PLC -- and in vitro -- on 
> a dedicated couple of myplcs that run the TestPeers.py code -- and 
> have not seen any drawback with this patch.
>
> I would appreciate an update from Princeton about your position on 
> this proposal, and I would be grateful if this change could make its 
> way until your production code at some point, so I could relax the 
> constraint on email for our users (so far we have been using fake 
> aliases in the @one-lab.org domain).
>
> -- Thierry
>
>
> Thierry Parmentelat wrote:
>> Hi Tony
>>
>> I think I have a working version for this email-namespace problem
>> Please find below the patch I am suggesting -- I have not mentioned 
>> the change in Nodes.py (about validate_last_contact) since I think 
>> you already merged it in the princeton codebase.
>>
>> The only additional trick is to artificially update the instances of 
>> persons as returned from GetPeerData. More specifically, GetPeerData 
>> returns person dicts where peer_id is set to None (since we've asked 
>> for the peering end's local persons), so I just need to set it to the 
>> local peer_id that depicts the peering end, and use the idea that I 
>> described last week, about having validate_email check for duplicates 
>> on the candidate person's peer_id only.
>>
>> ==
>> As I said I am now going to review GetPeerData, and try and narrow 
>> the scope of the fields that get cached for federation. For instance 
>> with the current code, and since the introduction of last_contact, I 
>> have almost all nodes that get refreshed during two successive calls 
>> to RefreshPeer; I suspect this rather unfriendly behaviour to be due 
>> to last_contact being changed frequently (how often is that supposed 
>> to be, btw ?)
>> I hope this review can also drastically improve the overall 
>> performance of RefreshPeer -- at least by reducing the size of the 
>> xml traffic -- but to that end I am back to a decent federation test 
>> setup so this will take a while.
>> So more on this later.
>> ==
>>
>> I've been thinking about the possible implications of this change on 
>> the web pages, but as far as I can tell at this stage, there should 
>> not be any in fact.
>>
>> -- Thierry
>>
>> $ svn diff
>> Index: PLC/Persons.py
>> ===================================================================
>> --- PLC/Persons.py      (revision 492)
>> +++ PLC/Persons.py      (working copy)
>> @@ -100,7 +100,13 @@
>>         if len(domain) < 2:
>>             raise invalid_email
>>
>> -        conflicts = Persons(self.api, [email])
>> +       # Thierry - June 11 2007
>> +       # check only against users on the same peer
>> +       if 'peer_id' in self:
>> +           namespace_peer_id = self['peer_id']
>> +       else:
>> +           namespace_peer_id = None
>> +        conflicts = Persons(self.api, 
>> {'email':email,'peer_id':namespace_peer_id})
>>         for person in conflicts:
>>             if 'person_id' not in self or self['person_id'] != 
>> person['person_id']:
>>                 raise PLCInvalidArgument, "E-mail address already in 
>> use"
>> Index: PLC/Methods/RefreshPeer.py
>> ===================================================================
>> --- PLC/Methods/RefreshPeer.py  (revision 492)
>> +++ PLC/Methods/RefreshPeer.py  (working copy)
>> @@ -228,6 +228,10 @@
>>
>>         # Keyed on foreign person_id
>>         old_peer_persons = Persons(self.api, {'peer_id': peer_id}, 
>> columns).dict('peer_person_id')
>> +       # artificially attach the persons returned by GetPeerData to 
>> the new peer
>> +       # this is because validate_email needs peer_id to be correct 
>> when checking for duplicates
>> +       for person in peer_tables['Persons']:
>> +           person['peer_id']=peer_id
>>         persons_at_peer = dict([(peer_person['person_id'], 
>> peer_person) \
>>                                 for peer_person in 
>> peer_tables['Persons']])
>>
>>
>>
>> Tony Mack wrote:
>>> Hi Thierry,
>>>
>>> It seems this issue is a little tricky, but I think you are headed 
>>> in the right direction. I will be bringing up an alternate plc soon, 
>>> which will act as a test environment for new development. We can use 
>>> this alternate plc to help us test these federation issues.
>>>
>>> BTW, I will update Nodes.py to validate last_contact, thanks for 
>>> pointing this out.
>>>
>>> -- Tony
>>>
>>> Thierry Parmentelat wrote:
>>>> Tony
>>>> I did some progress on this topic:
>>>> - the patch I sent does not work as is because as I suspected there 
>>>> are cases where 'peer_id' is not provided at all to validate_email
>>>> - the protected version where we filter against peer_id when it is 
>>>> provided, or against peer_id=None when it's not, is half good. With 
>>>> this version I can locally add an e-mail address that is already 
>>>> present from another peer
>>>> - but I cannot successfully cache a remote email that would have 
>>>> already locally registered. This is much more tricky because 
>>>> RefreshPeer turns out to call sync on a Person object that has no 
>>>> peer_id set yet.
>>>>
>>>> So please do not waste too much time at this stage on 'how' to 
>>>> implement this, I'll have to spend more time on this next week.
>>>> But I'd still like your inputs on how this should be tested, 
>>>> because it clearly could have weird side-effects.
>>>>
>>>>
>>>> In the meanwhile, there definitely is something that you might 
>>>> consider doing on your side, in Nodes.py
>>>>
>>>>     validate_date_created = Row.validate_timestamp
>>>>     validate_last_updated = Row.validate_timestamp
>>>> +    validate_last_contact = Row.validate_timestamp
>>>>
>>>>     def update_last_contact(self, commit = True):
>>>>
>>>> which allows the last_contact field that you recently added, to get 
>>>> correctly refreshed from the integer-date we get through 
>>>> GetpeerData and xmlrpc.
>>>>
>>>> By the way, I also plan on reviewing the set of data that flows 
>>>> through GetPeerData.
>>>> It turns out this last_contact field gets cached togother with a 
>>>> lot more data that is definitely not crucial to federation, and
>>>> I am pretty sure I can substantially improve the overall 
>>>> performance. This is another story, but until I do this review 
>>>> you'll need this patch so RefreshPeer can work on your side.
>>>>
>>>> -- Thierry
>>>>
>>>>
>>>> Thierry Parmentelat wrote:
>>>>> Hi Tony and all
>>>>>
>>>>> I am currently trying to address the following issue.
>>>>> The current status of the api & RefreshPeer thing has a major 
>>>>> drawback. During the transitional period where European users will 
>>>>> start switching from the Public PlanetLab at PLC to 
>>>>> PlanetLabEurope, there may be a need for a user to be registered 
>>>>> at both myplc's at the same time.
>>>>> In this particular case, the current software does not allow the 
>>>>> user to use the same email address on both ends. The reasons for 
>>>>> this are that
>>>>> - AddPerson uses validate_email in Persons.py, that in turn checks 
>>>>> for any Person with that new email address and raises an exception 
>>>>> if such an entry can be found
>>>>> - RefreshPeer basically uses the same validate_email code when it 
>>>>> tries to sync newly created local objects
>>>>> -  Note however that no hard-wired constraints in the DB is 
>>>>> enabled here
>>>>>
>>>>>
>>>>> I am currently validating the following patch in PLC/Persons.py
>>>>>         if len(domain) < 2:
>>>>>             raise invalid_email
>>>>>
>>>>> -        conflicts = Persons(self.api, [email])
>>>>> +       # Thierry - June 8 2007
>>>>> +       # check only against users on the same peer
>>>>> +       # xxx should we protect against cases where self does not 
>>>>> hold a 'peer_id' key ?
>>>>> +        conflicts = Persons(self.api, 
>>>>> {'email':email,'peer_id':self['peer_id']})
>>>>>         for person in conflicts:
>>>>>             if 'person_id' not in self or self['person_id'] != 
>>>>> person['person_id']:
>>>>>
>>>>> As you see I just alter the uniqueness check by relaxing the 
>>>>> constraint, and prevent email duplication on a given peer only.
>>>>>
>>>>> I am having a hard time setting up a reasonable test environment 
>>>>> to check this feature as thoroughly as possible, so I cannot 
>>>>> commit on its correctness yet.
>>>>> I would appreciate to know your feelings about it anyway, that 
>>>>> might save me some time especially if this is a wrong track. So:
>>>>> - first I'd like to know whether you feel comfortable with that 
>>>>> patch and hear your comments if any. In particular do you think 
>>>>> there might be cases where validate_email can be called on a 
>>>>> 'self' that would not hold a 'peer_id' filed ?
>>>>> - and second, during my test so far I've noticed that this same 
>>>>> validate_email thing was also called when performing deletions. I 
>>>>> assume this applies to all types of objects. My feeling is that 
>>>>> this sounds awkward - for instance if one changes the validation 
>>>>> for a given entity to make it more strict, then injecting the new 
>>>>> code will prevent us from cleanly deleting entities that do not 
>>>>> fulfill the validation criteria - which are precisely the ones 
>>>>> we'd like to delete. This is of course much independant from the 
>>>>> central issue I am addressing here, but your comments about this 
>>>>> would be appreciated too.
>>>>>
>>>>>
>>>>> Thanks -- Thierry
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel at lists.planet-lab.org
>>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel



More information about the Devel mailing list