[Planetlab-devel] inter-plc federation - email uniqueness - proposed patch & discussion

Larry Peterson llp at CS.Princeton.EDU
Thu Jun 14 09:15:40 EDT 2007


Thierry,

I'll let Tony respond with details about the email patch,
but I wanted to comment on the larger peering interface.
I'd like to see us move towards an interface that separates
peers fetching slice info from node info (instead of the
single RefreshPeer call). We already have a GetNodes and a
GetSlivers call; these should suffice. Your thoughts?

Larry

On Jun 14, 2007, at 4:07 AM, Thierry Parmentelat wrote:

> Following up on this topic:
> I have been testing this change on both a real deployment -- the  
> one that runs the peering trial with the public PLC -- and in vitro  
> -- on a dedicated couple of myplcs that run the TestPeers.py code  
> -- and have not seen any drawback with this patch.
>
> I would appreciate an update from Princeton about your position on  
> this proposal, and I would be grateful if this change could make  
> its way until your production code at some point, so I could relax  
> the constraint on email for our users (so far we have been using  
> fake aliases in the @one-lab.org domain).
>
> -- Thierry
>
>
> Thierry Parmentelat wrote:
>> Hi Tony
>>
>> I think I have a working version for this email-namespace problem
>> Please find below the patch I am suggesting -- I have not  
>> mentioned the change in Nodes.py (about validate_last_contact)  
>> since I think you already merged it in the princeton codebase.
>>
>> The only additional trick is to artificially update the instances  
>> of persons as returned from GetPeerData. More specifically,  
>> GetPeerData returns person dicts where peer_id is set to None  
>> (since we've asked for the peering end's local persons), so I just  
>> need to set it to the local peer_id that depicts the peering end,  
>> and use the idea that I described last week, about having  
>> validate_email check for duplicates on the candidate person's  
>> peer_id only.
>>
>> ==
>> As I said I am now going to review GetPeerData, and try and narrow  
>> the scope of the fields that get cached for federation. For  
>> instance with the current code, and since the introduction of  
>> last_contact, I have almost all nodes that get refreshed during  
>> two successive calls to RefreshPeer; I suspect this rather  
>> unfriendly behaviour to be due to last_contact being changed  
>> frequently (how often is that supposed to be, btw ?)
>> I hope this review can also drastically improve the overall  
>> performance of RefreshPeer -- at least by reducing the size of the  
>> xml traffic -- but to that end I am back to a decent federation  
>> test setup so this will take a while.
>> So more on this later.
>> ==
>>
>> I've been thinking about the possible implications of this change  
>> on the web pages, but as far as I can tell at this stage, there  
>> should not be any in fact.
>>
>> -- Thierry
>>
>> $ svn diff
>> Index: PLC/Persons.py
>> ===================================================================
>> --- PLC/Persons.py      (revision 492)
>> +++ PLC/Persons.py      (working copy)
>> @@ -100,7 +100,13 @@
>>         if len(domain) < 2:
>>             raise invalid_email
>>
>> -        conflicts = Persons(self.api, [email])
>> +       # Thierry - June 11 2007
>> +       # check only against users on the same peer
>> +       if 'peer_id' in self:
>> +           namespace_peer_id = self['peer_id']
>> +       else:
>> +           namespace_peer_id = None
>> +        conflicts = Persons(self.api,  
>> {'email':email,'peer_id':namespace_peer_id})
>>         for person in conflicts:
>>             if 'person_id' not in self or self['person_id'] !=  
>> person['person_id']:
>>                 raise PLCInvalidArgument, "E-mail address already  
>> in use"
>> Index: PLC/Methods/RefreshPeer.py
>> ===================================================================
>> --- PLC/Methods/RefreshPeer.py  (revision 492)
>> +++ PLC/Methods/RefreshPeer.py  (working copy)
>> @@ -228,6 +228,10 @@
>>
>>         # Keyed on foreign person_id
>>         old_peer_persons = Persons(self.api, {'peer_id': peer_id},  
>> columns).dict('peer_person_id')
>> +       # artificially attach the persons returned by GetPeerData  
>> to the new peer
>> +       # this is because validate_email needs peer_id to be  
>> correct when checking for duplicates
>> +       for person in peer_tables['Persons']:
>> +           person['peer_id']=peer_id
>>         persons_at_peer = dict([(peer_person['person_id'],  
>> peer_person) \
>>                                 for peer_person in peer_tables 
>> ['Persons']])
>>
>>
>>
>> Tony Mack wrote:
>>> Hi Thierry,
>>>
>>> It seems this issue is a little tricky, but I think you are  
>>> headed in the right direction. I will be bringing up an alternate  
>>> plc soon, which will act as a test environment for new  
>>> development. We can use this alternate plc to help us test these  
>>> federation issues.
>>>
>>> BTW, I will update Nodes.py to validate last_contact, thanks for  
>>> pointing this out.
>>>
>>> -- Tony
>>>
>>> Thierry Parmentelat wrote:
>>>> Tony
>>>> I did some progress on this topic:
>>>> - the patch I sent does not work as is because as I suspected  
>>>> there are cases where 'peer_id' is not provided at all to  
>>>> validate_email
>>>> - the protected version where we filter against peer_id when it  
>>>> is provided, or against peer_id=None when it's not, is half  
>>>> good. With this version I can locally add an e-mail address that  
>>>> is already present from another peer
>>>> - but I cannot successfully cache a remote email that would have  
>>>> already locally registered. This is much more tricky because  
>>>> RefreshPeer turns out to call sync on a Person object that has  
>>>> no peer_id set yet.
>>>>
>>>> So please do not waste too much time at this stage on 'how' to  
>>>> implement this, I'll have to spend more time on this next week.
>>>> But I'd still like your inputs on how this should be tested,  
>>>> because it clearly could have weird side-effects.
>>>>
>>>>
>>>> In the meanwhile, there definitely is something that you might  
>>>> consider doing on your side, in Nodes.py
>>>>
>>>>     validate_date_created = Row.validate_timestamp
>>>>     validate_last_updated = Row.validate_timestamp
>>>> +    validate_last_contact = Row.validate_timestamp
>>>>
>>>>     def update_last_contact(self, commit = True):
>>>>
>>>> which allows the last_contact field that you recently added, to  
>>>> get correctly refreshed from the integer-date we get through  
>>>> GetpeerData and xmlrpc.
>>>>
>>>> By the way, I also plan on reviewing the set of data that flows  
>>>> through GetPeerData.
>>>> It turns out this last_contact field gets cached togother with a  
>>>> lot more data that is definitely not crucial to federation, and
>>>> I am pretty sure I can substantially improve the overall  
>>>> performance. This is another story, but until I do this review  
>>>> you'll need this patch so RefreshPeer can work on your side.
>>>>
>>>> -- Thierry
>>>>
>>>>
>>>> Thierry Parmentelat wrote:
>>>>> Hi Tony and all
>>>>>
>>>>> I am currently trying to address the following issue.
>>>>> The current status of the api & RefreshPeer thing has a major  
>>>>> drawback. During the transitional period where European users  
>>>>> will start switching from the Public PlanetLab at PLC to  
>>>>> PlanetLabEurope, there may be a need for a user to be  
>>>>> registered at both myplc's at the same time.
>>>>> In this particular case, the current software does not allow  
>>>>> the user to use the same email address on both ends. The  
>>>>> reasons for this are that
>>>>> - AddPerson uses validate_email in Persons.py, that in turn  
>>>>> checks for any Person with that new email address and raises an  
>>>>> exception if such an entry can be found
>>>>> - RefreshPeer basically uses the same validate_email code when  
>>>>> it tries to sync newly created local objects
>>>>> -  Note however that no hard-wired constraints in the DB is  
>>>>> enabled here
>>>>>
>>>>>
>>>>> I am currently validating the following patch in PLC/Persons.py
>>>>>         if len(domain) < 2:
>>>>>             raise invalid_email
>>>>>
>>>>> -        conflicts = Persons(self.api, [email])
>>>>> +       # Thierry - June 8 2007
>>>>> +       # check only against users on the same peer
>>>>> +       # xxx should we protect against cases where self does  
>>>>> not hold a 'peer_id' key ?
>>>>> +        conflicts = Persons(self.api,  
>>>>> {'email':email,'peer_id':self['peer_id']})
>>>>>         for person in conflicts:
>>>>>             if 'person_id' not in self or self['person_id'] !=  
>>>>> person['person_id']:
>>>>>
>>>>> As you see I just alter the uniqueness check by relaxing the  
>>>>> constraint, and prevent email duplication on a given peer only.
>>>>>
>>>>> I am having a hard time setting up a reasonable test  
>>>>> environment to check this feature as thoroughly as possible, so  
>>>>> I cannot commit on its correctness yet.
>>>>> I would appreciate to know your feelings about it anyway, that  
>>>>> might save me some time especially if this is a wrong track. So:
>>>>> - first I'd like to know whether you feel comfortable with that  
>>>>> patch and hear your comments if any. In particular do you think  
>>>>> there might be cases where validate_email can be called on a  
>>>>> 'self' that would not hold a 'peer_id' filed ?
>>>>> - and second, during my test so far I've noticed that this same  
>>>>> validate_email thing was also called when performing deletions.  
>>>>> I assume this applies to all types of objects. My feeling is  
>>>>> that this sounds awkward - for instance if one changes the  
>>>>> validation for a given entity to make it more strict, then  
>>>>> injecting the new code will prevent us from cleanly deleting  
>>>>> entities that do not fulfill the validation criteria - which  
>>>>> are precisely the ones we'd like to delete. This is of course  
>>>>> much independant from the central issue I am addressing here,  
>>>>> but your comments about this would be appreciated too.
>>>>>
>>>>>
>>>>> Thanks -- Thierry
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel at lists.planet-lab.org
>>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
>



More information about the Devel mailing list