[Planetlab-devel] inter-plc federation
- email uniqueness - proposed patch & discussion
Tony Mack
tmack at CS.Princeton.EDU
Thu Jun 14 11:52:13 EDT 2007
I will update our code with your patch soon.
Thierry Parmentelat wrote:
> Following up on this topic:
> I have been testing this change on both a real deployment -- the one
> that runs the peering trial with the public PLC -- and in vitro -- on
> a dedicated couple of myplcs that run the TestPeers.py code -- and
> have not seen any drawback with this patch.
>
> I would appreciate an update from Princeton about your position on
> this proposal, and I would be grateful if this change could make its
> way until your production code at some point, so I could relax the
> constraint on email for our users (so far we have been using fake
> aliases in the @one-lab.org domain).
>
> -- Thierry
>
>
> Thierry Parmentelat wrote:
>> Hi Tony
>>
>> I think I have a working version for this email-namespace problem
>> Please find below the patch I am suggesting -- I have not mentioned
>> the change in Nodes.py (about validate_last_contact) since I think
>> you already merged it in the princeton codebase.
>>
>> The only additional trick is to artificially update the instances of
>> persons as returned from GetPeerData. More specifically, GetPeerData
>> returns person dicts where peer_id is set to None (since we've asked
>> for the peering end's local persons), so I just need to set it to the
>> local peer_id that depicts the peering end, and use the idea that I
>> described last week, about having validate_email check for duplicates
>> on the candidate person's peer_id only.
>>
>> ==
>> As I said I am now going to review GetPeerData, and try and narrow
>> the scope of the fields that get cached for federation. For instance
>> with the current code, and since the introduction of last_contact, I
>> have almost all nodes that get refreshed during two successive calls
>> to RefreshPeer; I suspect this rather unfriendly behaviour to be due
>> to last_contact being changed frequently (how often is that supposed
>> to be, btw ?)
>> I hope this review can also drastically improve the overall
>> performance of RefreshPeer -- at least by reducing the size of the
>> xml traffic -- but to that end I am back to a decent federation test
>> setup so this will take a while.
>> So more on this later.
>> ==
>>
>> I've been thinking about the possible implications of this change on
>> the web pages, but as far as I can tell at this stage, there should
>> not be any in fact.
>>
>> -- Thierry
>>
>> $ svn diff
>> Index: PLC/Persons.py
>> ===================================================================
>> --- PLC/Persons.py (revision 492)
>> +++ PLC/Persons.py (working copy)
>> @@ -100,7 +100,13 @@
>> if len(domain) < 2:
>> raise invalid_email
>>
>> - conflicts = Persons(self.api, [email])
>> + # Thierry - June 11 2007
>> + # check only against users on the same peer
>> + if 'peer_id' in self:
>> + namespace_peer_id = self['peer_id']
>> + else:
>> + namespace_peer_id = None
>> + conflicts = Persons(self.api,
>> {'email':email,'peer_id':namespace_peer_id})
>> for person in conflicts:
>> if 'person_id' not in self or self['person_id'] !=
>> person['person_id']:
>> raise PLCInvalidArgument, "E-mail address already in
>> use"
>> Index: PLC/Methods/RefreshPeer.py
>> ===================================================================
>> --- PLC/Methods/RefreshPeer.py (revision 492)
>> +++ PLC/Methods/RefreshPeer.py (working copy)
>> @@ -228,6 +228,10 @@
>>
>> # Keyed on foreign person_id
>> old_peer_persons = Persons(self.api, {'peer_id': peer_id},
>> columns).dict('peer_person_id')
>> + # artificially attach the persons returned by GetPeerData to
>> the new peer
>> + # this is because validate_email needs peer_id to be correct
>> when checking for duplicates
>> + for person in peer_tables['Persons']:
>> + person['peer_id']=peer_id
>> persons_at_peer = dict([(peer_person['person_id'],
>> peer_person) \
>> for peer_person in
>> peer_tables['Persons']])
>>
>>
>>
>> Tony Mack wrote:
>>> Hi Thierry,
>>>
>>> It seems this issue is a little tricky, but I think you are headed
>>> in the right direction. I will be bringing up an alternate plc soon,
>>> which will act as a test environment for new development. We can use
>>> this alternate plc to help us test these federation issues.
>>>
>>> BTW, I will update Nodes.py to validate last_contact, thanks for
>>> pointing this out.
>>>
>>> -- Tony
>>>
>>> Thierry Parmentelat wrote:
>>>> Tony
>>>> I did some progress on this topic:
>>>> - the patch I sent does not work as is because as I suspected there
>>>> are cases where 'peer_id' is not provided at all to validate_email
>>>> - the protected version where we filter against peer_id when it is
>>>> provided, or against peer_id=None when it's not, is half good. With
>>>> this version I can locally add an e-mail address that is already
>>>> present from another peer
>>>> - but I cannot successfully cache a remote email that would have
>>>> already locally registered. This is much more tricky because
>>>> RefreshPeer turns out to call sync on a Person object that has no
>>>> peer_id set yet.
>>>>
>>>> So please do not waste too much time at this stage on 'how' to
>>>> implement this, I'll have to spend more time on this next week.
>>>> But I'd still like your inputs on how this should be tested,
>>>> because it clearly could have weird side-effects.
>>>>
>>>>
>>>> In the meanwhile, there definitely is something that you might
>>>> consider doing on your side, in Nodes.py
>>>>
>>>> validate_date_created = Row.validate_timestamp
>>>> validate_last_updated = Row.validate_timestamp
>>>> + validate_last_contact = Row.validate_timestamp
>>>>
>>>> def update_last_contact(self, commit = True):
>>>>
>>>> which allows the last_contact field that you recently added, to get
>>>> correctly refreshed from the integer-date we get through
>>>> GetpeerData and xmlrpc.
>>>>
>>>> By the way, I also plan on reviewing the set of data that flows
>>>> through GetPeerData.
>>>> It turns out this last_contact field gets cached togother with a
>>>> lot more data that is definitely not crucial to federation, and
>>>> I am pretty sure I can substantially improve the overall
>>>> performance. This is another story, but until I do this review
>>>> you'll need this patch so RefreshPeer can work on your side.
>>>>
>>>> -- Thierry
>>>>
>>>>
>>>> Thierry Parmentelat wrote:
>>>>> Hi Tony and all
>>>>>
>>>>> I am currently trying to address the following issue.
>>>>> The current status of the api & RefreshPeer thing has a major
>>>>> drawback. During the transitional period where European users will
>>>>> start switching from the Public PlanetLab at PLC to
>>>>> PlanetLabEurope, there may be a need for a user to be registered
>>>>> at both myplc's at the same time.
>>>>> In this particular case, the current software does not allow the
>>>>> user to use the same email address on both ends. The reasons for
>>>>> this are that
>>>>> - AddPerson uses validate_email in Persons.py, that in turn checks
>>>>> for any Person with that new email address and raises an exception
>>>>> if such an entry can be found
>>>>> - RefreshPeer basically uses the same validate_email code when it
>>>>> tries to sync newly created local objects
>>>>> - Note however that no hard-wired constraints in the DB is
>>>>> enabled here
>>>>>
>>>>>
>>>>> I am currently validating the following patch in PLC/Persons.py
>>>>> if len(domain) < 2:
>>>>> raise invalid_email
>>>>>
>>>>> - conflicts = Persons(self.api, [email])
>>>>> + # Thierry - June 8 2007
>>>>> + # check only against users on the same peer
>>>>> + # xxx should we protect against cases where self does not
>>>>> hold a 'peer_id' key ?
>>>>> + conflicts = Persons(self.api,
>>>>> {'email':email,'peer_id':self['peer_id']})
>>>>> for person in conflicts:
>>>>> if 'person_id' not in self or self['person_id'] !=
>>>>> person['person_id']:
>>>>>
>>>>> As you see I just alter the uniqueness check by relaxing the
>>>>> constraint, and prevent email duplication on a given peer only.
>>>>>
>>>>> I am having a hard time setting up a reasonable test environment
>>>>> to check this feature as thoroughly as possible, so I cannot
>>>>> commit on its correctness yet.
>>>>> I would appreciate to know your feelings about it anyway, that
>>>>> might save me some time especially if this is a wrong track. So:
>>>>> - first I'd like to know whether you feel comfortable with that
>>>>> patch and hear your comments if any. In particular do you think
>>>>> there might be cases where validate_email can be called on a
>>>>> 'self' that would not hold a 'peer_id' filed ?
>>>>> - and second, during my test so far I've noticed that this same
>>>>> validate_email thing was also called when performing deletions. I
>>>>> assume this applies to all types of objects. My feeling is that
>>>>> this sounds awkward - for instance if one changes the validation
>>>>> for a given entity to make it more strict, then injecting the new
>>>>> code will prevent us from cleanly deleting entities that do not
>>>>> fulfill the validation criteria - which are precisely the ones
>>>>> we'd like to delete. This is of course much independant from the
>>>>> central issue I am addressing here, but your comments about this
>>>>> would be appreciated too.
>>>>>
>>>>>
>>>>> Thanks -- Thierry
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel at lists.planet-lab.org
>>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
More information about the Devel
mailing list