[Planetlab-devel] inter-plc federation
- email uniqueness - proposed patch & discussion
Thierry Parmentelat
Thierry.Parmentelat at sophia.inria.fr
Thu Jun 14 12:28:05 EDT 2007
Hello Larry
Just to make things clear, we actually defined two methods. When
e.g. OneLab wants to refresh its own cached copy of the PlanetLab
database, then
(*) the OneLab instance of the API locally runs the
RefreshPeer('PlanetLab') call,
(*) that in turn invokes GetPeerData() that fetches data from the
PlanetLab end
So there are several things in your suggestion:
(*) the simplest: GetPeerData actually just performs a set of
Get<Objects>; the reasons why we defined this GetPeerData were
(1) for authentication reasons; we exchange data that may be sensitive,
so we needed and implemented a gpg-based authentication mechanism that
the Peer object uses when issuing its data request. So having a new
specific method allows to clearly define which data are exposed to
Peers. Also, GetPeerData does not support anything like filtering or
column selection, it's the provider peer that decides which data are
exposed to the caching mechanism.
(2) for performance, because each xmlrpm call is costly - but probably
this isn't so important - and convenience because we can return some
execution-time data that GetNodes and the like do not, of course
So on this aspect, my position would be, we should keep this idea of
another method for confidentiality, but we could very well split
GetPeerData into several GetPeerNodes, ... calls if we wanted to split
RefreshPeer into several distinct calls as you are suggesting.
(*) a bit less simple: RefreshPeer gets all the data corresponding to a
'consistent' state of the provider peer. That is to say, there cannot be
a case where e.g. a foreign slice refers to a foreign node that does not
exist. Now if you split RefreshPeer into two separate functions, you may
face situations where the collected data is less consistent, and I
cannot assess the impact on the code. Maybe Mark told you, but this
caching code is quite tricky, so unless you have a really good reason to
want this separation.. I should have before, but why would you want to
sync only half of the information you need ?
(*) way less simple: Right now - as part of GetPeerData - we use
GetSlices as opposed to GetSlivers to request to provider peer's data.
We had quite a long discussion with Mark about that. He was suggesting
GetSlivers too, and I eventually convinced him of the other option.
Think about it. Imagine you have one slice, say comon, on your 700
nodes. Now if I call GetSlivers I am going to get 700 times the same
information. Not too efficient (remember we are talking xmlrpc
messages). Worse still, there's nothing in this huge data that tells me
that, so I basically have to write quite a complex piece of code (that
would basically do a 'cartesian divide') so I can realize that there was
only one slice involved, and store this information in my database
schema - unless I accept to design another scheme for storing this
information (there's no such thing as a sliver in the database).
Using GetSlices has all advantages. Only the relevant data goes on the
wire; and I can basically store everything in my database as if it was a
local object; I just need a uniform means to mark objects as 'local' or
'foreign' and this is where the 'peer_id' comes in.
I am not too sure you actually meant all this, but you asked for my
thoughts so here you are.
Again maybe it would help if you elaborated on why you want this
separaration, that is somewhat artificial if you think in terms of
symetric peering.
-- Thierry
Larry Peterson wrote:
> Thierry,
>
> I'll let Tony respond with details about the email patch,
> but I wanted to comment on the larger peering interface.
> I'd like to see us move towards an interface that separates
> peers fetching slice info from node info (instead of the
> single RefreshPeer call). We already have a GetNodes and a
> GetSlivers call; these should suffice. Your thoughts?
>
> Larry
>
> On Jun 14, 2007, at 4:07 AM, Thierry Parmentelat wrote:
>
>> Following up on this topic:
>> I have been testing this change on both a real deployment -- the one
>> that runs the peering trial with the public PLC -- and in vitro -- on
>> a dedicated couple of myplcs that run the TestPeers.py code -- and
>> have not seen any drawback with this patch.
>>
>> I would appreciate an update from Princeton about your position on
>> this proposal, and I would be grateful if this change could make its
>> way until your production code at some point, so I could relax the
>> constraint on email for our users (so far we have been using fake
>> aliases in the @one-lab.org domain).
>>
>> -- Thierry
>>
>>
>> Thierry Parmentelat wrote:
>>> Hi Tony
>>>
>>> I think I have a working version for this email-namespace problem
>>> Please find below the patch I am suggesting -- I have not mentioned
>>> the change in Nodes.py (about validate_last_contact) since I think
>>> you already merged it in the princeton codebase.
>>>
>>> The only additional trick is to artificially update the instances of
>>> persons as returned from GetPeerData. More specifically, GetPeerData
>>> returns person dicts where peer_id is set to None (since we've asked
>>> for the peering end's local persons), so I just need to set it to
>>> the local peer_id that depicts the peering end, and use the idea
>>> that I described last week, about having validate_email check for
>>> duplicates on the candidate person's peer_id only.
>>>
>>> ==
>>> As I said I am now going to review GetPeerData, and try and narrow
>>> the scope of the fields that get cached for federation. For instance
>>> with the current code, and since the introduction of last_contact, I
>>> have almost all nodes that get refreshed during two successive calls
>>> to RefreshPeer; I suspect this rather unfriendly behaviour to be due
>>> to last_contact being changed frequently (how often is that supposed
>>> to be, btw ?)
>>> I hope this review can also drastically improve the overall
>>> performance of RefreshPeer -- at least by reducing the size of the
>>> xml traffic -- but to that end I am back to a decent federation test
>>> setup so this will take a while.
>>> So more on this later.
>>> ==
>>>
>>> I've been thinking about the possible implications of this change on
>>> the web pages, but as far as I can tell at this stage, there should
>>> not be any in fact.
>>>
>>> -- Thierry
>>>
>>> $ svn diff
>>> Index: PLC/Persons.py
>>> ===================================================================
>>> --- PLC/Persons.py (revision 492)
>>> +++ PLC/Persons.py (working copy)
>>> @@ -100,7 +100,13 @@
>>> if len(domain) < 2:
>>> raise invalid_email
>>>
>>> - conflicts = Persons(self.api, [email])
>>> + # Thierry - June 11 2007
>>> + # check only against users on the same peer
>>> + if 'peer_id' in self:
>>> + namespace_peer_id = self['peer_id']
>>> + else:
>>> + namespace_peer_id = None
>>> + conflicts = Persons(self.api,
>>> {'email':email,'peer_id':namespace_peer_id})
>>> for person in conflicts:
>>> if 'person_id' not in self or self['person_id'] !=
>>> person['person_id']:
>>> raise PLCInvalidArgument, "E-mail address already in
>>> use"
>>> Index: PLC/Methods/RefreshPeer.py
>>> ===================================================================
>>> --- PLC/Methods/RefreshPeer.py (revision 492)
>>> +++ PLC/Methods/RefreshPeer.py (working copy)
>>> @@ -228,6 +228,10 @@
>>>
>>> # Keyed on foreign person_id
>>> old_peer_persons = Persons(self.api, {'peer_id': peer_id},
>>> columns).dict('peer_person_id')
>>> + # artificially attach the persons returned by GetPeerData to
>>> the new peer
>>> + # this is because validate_email needs peer_id to be correct
>>> when checking for duplicates
>>> + for person in peer_tables['Persons']:
>>> + person['peer_id']=peer_id
>>> persons_at_peer = dict([(peer_person['person_id'],
>>> peer_person) \
>>> for peer_person in
>>> peer_tables['Persons']])
>>>
>>>
>>>
>>> Tony Mack wrote:
>>>> Hi Thierry,
>>>>
>>>> It seems this issue is a little tricky, but I think you are headed
>>>> in the right direction. I will be bringing up an alternate plc
>>>> soon, which will act as a test environment for new development. We
>>>> can use this alternate plc to help us test these federation issues.
>>>>
>>>> BTW, I will update Nodes.py to validate last_contact, thanks for
>>>> pointing this out.
>>>>
>>>> -- Tony
>>>>
>>>> Thierry Parmentelat wrote:
>>>>> Tony
>>>>> I did some progress on this topic:
>>>>> - the patch I sent does not work as is because as I suspected
>>>>> there are cases where 'peer_id' is not provided at all to
>>>>> validate_email
>>>>> - the protected version where we filter against peer_id when it is
>>>>> provided, or against peer_id=None when it's not, is half good.
>>>>> With this version I can locally add an e-mail address that is
>>>>> already present from another peer
>>>>> - but I cannot successfully cache a remote email that would have
>>>>> already locally registered. This is much more tricky because
>>>>> RefreshPeer turns out to call sync on a Person object that has no
>>>>> peer_id set yet.
>>>>>
>>>>> So please do not waste too much time at this stage on 'how' to
>>>>> implement this, I'll have to spend more time on this next week.
>>>>> But I'd still like your inputs on how this should be tested,
>>>>> because it clearly could have weird side-effects.
>>>>>
>>>>>
>>>>> In the meanwhile, there definitely is something that you might
>>>>> consider doing on your side, in Nodes.py
>>>>>
>>>>> validate_date_created = Row.validate_timestamp
>>>>> validate_last_updated = Row.validate_timestamp
>>>>> + validate_last_contact = Row.validate_timestamp
>>>>>
>>>>> def update_last_contact(self, commit = True):
>>>>>
>>>>> which allows the last_contact field that you recently added, to
>>>>> get correctly refreshed from the integer-date we get through
>>>>> GetpeerData and xmlrpc.
>>>>>
>>>>> By the way, I also plan on reviewing the set of data that flows
>>>>> through GetPeerData.
>>>>> It turns out this last_contact field gets cached togother with a
>>>>> lot more data that is definitely not crucial to federation, and
>>>>> I am pretty sure I can substantially improve the overall
>>>>> performance. This is another story, but until I do this review
>>>>> you'll need this patch so RefreshPeer can work on your side.
>>>>>
>>>>> -- Thierry
>>>>>
>>>>>
>>>>> Thierry Parmentelat wrote:
>>>>>> Hi Tony and all
>>>>>>
>>>>>> I am currently trying to address the following issue.
>>>>>> The current status of the api & RefreshPeer thing has a major
>>>>>> drawback. During the transitional period where European users
>>>>>> will start switching from the Public PlanetLab at PLC to
>>>>>> PlanetLabEurope, there may be a need for a user to be registered
>>>>>> at both myplc's at the same time.
>>>>>> In this particular case, the current software does not allow the
>>>>>> user to use the same email address on both ends. The reasons for
>>>>>> this are that
>>>>>> - AddPerson uses validate_email in Persons.py, that in turn
>>>>>> checks for any Person with that new email address and raises an
>>>>>> exception if such an entry can be found
>>>>>> - RefreshPeer basically uses the same validate_email code when it
>>>>>> tries to sync newly created local objects
>>>>>> - Note however that no hard-wired constraints in the DB is
>>>>>> enabled here
>>>>>>
>>>>>>
>>>>>> I am currently validating the following patch in PLC/Persons.py
>>>>>> if len(domain) < 2:
>>>>>> raise invalid_email
>>>>>>
>>>>>> - conflicts = Persons(self.api, [email])
>>>>>> + # Thierry - June 8 2007
>>>>>> + # check only against users on the same peer
>>>>>> + # xxx should we protect against cases where self does not
>>>>>> hold a 'peer_id' key ?
>>>>>> + conflicts = Persons(self.api,
>>>>>> {'email':email,'peer_id':self['peer_id']})
>>>>>> for person in conflicts:
>>>>>> if 'person_id' not in self or self['person_id'] !=
>>>>>> person['person_id']:
>>>>>>
>>>>>> As you see I just alter the uniqueness check by relaxing the
>>>>>> constraint, and prevent email duplication on a given peer only.
>>>>>>
>>>>>> I am having a hard time setting up a reasonable test environment
>>>>>> to check this feature as thoroughly as possible, so I cannot
>>>>>> commit on its correctness yet.
>>>>>> I would appreciate to know your feelings about it anyway, that
>>>>>> might save me some time especially if this is a wrong track. So:
>>>>>> - first I'd like to know whether you feel comfortable with that
>>>>>> patch and hear your comments if any. In particular do you think
>>>>>> there might be cases where validate_email can be called on a
>>>>>> 'self' that would not hold a 'peer_id' filed ?
>>>>>> - and second, during my test so far I've noticed that this same
>>>>>> validate_email thing was also called when performing deletions. I
>>>>>> assume this applies to all types of objects. My feeling is that
>>>>>> this sounds awkward - for instance if one changes the validation
>>>>>> for a given entity to make it more strict, then injecting the new
>>>>>> code will prevent us from cleanly deleting entities that do not
>>>>>> fulfill the validation criteria - which are precisely the ones
>>>>>> we'd like to delete. This is of course much independant from the
>>>>>> central issue I am addressing here, but your comments about this
>>>>>> would be appreciated too.
>>>>>>
>>>>>>
>>>>>> Thanks -- Thierry
>>>>>>
>>>>>> _______________________________________________
>>>>>> Devel mailing list
>>>>>> Devel at lists.planet-lab.org
>>>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel at lists.planet-lab.org
>>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at lists.planet-lab.org
>>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at lists.planet-lab.org
>>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>>
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
More information about the Devel
mailing list