[Rock-dev] Questions Regarding Typelib

Discussion:

[Rock-dev] Questions Regarding Typelib

Janosch Machowinski

2014-06-05 11:05:49 UTC

Hey,
I got some question, regarding how typelib works.

Here is the stuff I found out, correct me if this is wrong:

It loads a type description, either from the installed TLB files,
or from a log stream. From this a Typelib::Type is generated.

To store or load a type, one needs to create a Typelib::Value
A typlib value is a unpacked version of the Type, plus the type description

dump() packs the value into a data ream, and load() unpacks it from a
data stream.

Here is my first question, how can I load non POD types. I suppose,
I must supply a preinitialized type as target Pointer ?

Is there a way of doing this anonymously ?

How do I check (in c++) if the local type and the stream type are the same ?
I suppose I load the system tlb and check it against the stream tlb, is
there
already a function for this ?

Greetings
Janosch
--
Dipl. Inf. Janosch Machowinski
SAR- & Sicherheitsrobotik

Universit?t Bremen
FB 3 - Mathematik und Informatik
AG Robotik
Robert-Hooke-Stra?e 1
28359 Bremen, Germany

Zentrale: +49 421 178 45-6611

Besuchsadresse der Nebengesch?ftstelle:
Robert-Hooke-Stra?e 5
28359 Bremen, Germany

Tel.: +49 421 178 45-6614
Empfang: +49 421 178 45-6600
Fax: +49 421 178 45-4150
E-Mail: jmachowinski at informatik.uni-bremen.de

Weitere Informationen: http://www.informatik.uni-bremen.de/robotik

Sylvain Joyeux

2014-06-07 14:29:03 UTC

On Thu, Jun 5, 2014 at 1:05 PM, Janosch Machowinski <

Post by Janosch Machowinski
It loads a type description, either from the installed TLB files,
or from a log stream. From this a Typelib::Type is generated.

Not exactly. TLB files represent a registry, i.e. a self-consistent set of
types (Pocolog streams simply embed a XML document that is in the same
format than the .tlb files). What you therefore get from the XML
description is a Typelib::Registry object out of which you can query for
the Typelib::Type object(s) that describe your type.

Post by Janosch Machowinski
To store or load a type, one needs to create a Typelib::Value
A typlib value is a unpacked version of the Type, plus the type description

You store and load a value, not a type. Types describe values.

A value is a typed pointer, i.e. a pointer to memory with the Typelib::Type
object that describes the binary layout of said memory.

Post by Janosch Machowinski
Here is my first question, how can I load non POD types. I suppose,
I must supply a preinitialized type as target Pointer ?
Is there a way of doing this anonymously ?

Yes. You must create a big enough buffer (use type.size() to know how big)
and initialize it with Typelib::init()

How do I check (in c++) if the local type and the stream type are the same ?

Post by Janosch Machowinski
I suppose I load the system tlb and check it against the stream tlb, is
there
already a function for this ?

For you application; type.isSame(*other_type) will do the trick. Do NOT use
== unless you know the two types are within the same registry. Another way
is to merge the two registries. It will fail if some types are found with
same name and different definitions (thus ensuring that all the types are
the same)

Warning: there is no such thing as a 'system tlb'. TLBs are saved
per-orogen component to make sure that it indeed describes the types that
the component uses (very important in case of partial rebuilds after a type
change)

Given the stream of commits in pocolog_cpp and your questions, you are
obviously reimplementing in C++ big chunks of what is already done in Ruby.
Care to share your plans with the group ?

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140607/f8ad2d78/attachment.htm

Janosch Machowinski

2014-06-07 21:13:59 UTC

Hey,
I was annoyed by the current playback speed for logdata,
which is 0.6 of realtime on my Core i7. Therefore I invested
a day to look into the pocolog cpp implementation.

Indexing and log stream alignment is a lot faster with
the C++ implementation. Opening of an index file takes
~30ms now. Stream alignment for a 3GB data set with
3 Million samples around 2 seconds.

The raw playback speed, meaning loading all data from
disc for all streams, is at 24x the speed of realtime at the
moment.

The next step would be to create a binding for pocolog_cpp
and replace the pocolog implementation with this one.

I am unsure about the Interface to the ruby part though.
My first Idea was to do the unpacking of the types in Cpp
and only give out packed types, if they do not match the
local types. In general I don't really like to give out packed
types in the same way as unpacked ones.

Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?

Post by Sylvain Joyeux
Yes. You must create a big enough buffer (use type.size() to know how
big) and initialize it with Typelib::init()

How does Typelib::init work ? I didn't find code that would set up the
virtual table
or the code segment...

Greetings
Janosch

Post by Sylvain Joyeux
On Thu, Jun 5, 2014 at 1:05 PM, Janosch Machowinski
It loads a type description, either from the installed TLB files,
or from a log stream. From this a Typelib::Type is generated.
Not exactly. TLB files represent a registry, i.e. a self-consistent
set of types (Pocolog streams simply embed a XML document that is in
the same format than the .tlb files). What you therefore get from the
XML description is a Typelib::Registry object out of which you can
query for the Typelib::Type object(s) that describe your type.
To store or load a type, one needs to create a Typelib::Value
A typlib value is a unpacked version of the Type, plus the type description
You store and load a value, not a type. Types describe values.
A value is a typed pointer, i.e. a pointer to memory with the
Typelib::Type object that describes the binary layout of said memory.
Here is my first question, how can I load non POD types. I suppose,
I must supply a preinitialized type as target Pointer ?
Is there a way of doing this anonymously ?
Yes. You must create a big enough buffer (use type.size() to know how
big) and initialize it with Typelib::init()
How do I check (in c++) if the local type and the stream type are the same ?
I suppose I load the system tlb and check it against the stream tlb, is
there
already a function for this ?
For you application; type.isSame(*other_type) will do the trick. Do
NOT use == unless you know the two types are within the same registry.
Another way is to merge the two registries. It will fail if some types
are found with same name and different definitions (thus ensuring that
all the types are the same)
Warning: there is no such thing as a 'system tlb'. TLBs are saved
per-orogen component to make sure that it indeed describes the types
that the component uses (very important in case of partial rebuilds
after a type change)
Sylvain

Sylvain Joyeux

2014-06-08 09:01:30 UTC

On Sat, Jun 7, 2014 at 11:13 PM, Janosch Machowinski <

Post by Janosch Machowinski
Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?

What do you call "packed" and "unpacked" there ? /me very confused

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140608/cab72c0f/attachment.htm

Janosch Machowinski

2014-06-08 09:12:55 UTC

A packed type is the the representation on the disc.
A unpacked type would be a correct initialized type
with the data from the packed type copied to the correct
offsets.
Greetings
Janosch

Post by Sylvain Joyeux
On Sat, Jun 7, 2014 at 11:13 PM, Janosch Machowinski
Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?
What do you call "packed" and "unpacked" there ? /me very confused
Sylvain

Janosch Machowinski

2014-06-08 19:35:25 UTC

Hm,
after thinking about it a while, I figured, there is nothing like a
packed type.
I suppose you always unpack the data from disc into the given memory area.
This has the advantage, of using the same code path later on, for
handling the
member access.
So for the binding part to ruby, I would simply unpack into the given
memory area, and not care at all, if this is a system type, or not.

I don't really like this, as one would rely on the caller to make sure
that the given memory area is preinitialized properly.

Just out of curiosity, is there a way in typelib, to get a preinitialized
memory area for a type containing a virtual method ?
Greetings
Janosch

Post by Janosch Machowinski
A packed type is the the representation on the disc.
A unpacked type would be a correct initialized type
with the data from the packed type copied to the correct
offsets.
Greetings
Janosch

Post by Sylvain Joyeux
On Sat, Jun 7, 2014 at 11:13 PM, Janosch Machowinski
Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?
What do you call "packed" and "unpacked" there ? /me very confused
Sylvain

_______________________________________________
Rock-dev mailing list
Rock-dev at dfki.de
http://www.dfki.de/mailman/cgi-bin/listinfo/rock-dev

Sylvain Joyeux

2014-06-09 20:10:53 UTC

Post by Janosch Machowinski
I don't really like this, as one would rely on the caller to make sure
that the given memory area is preinitialized properly.

This is the only proper way one can design an API like this, as we often
want to be able to pass the same sample object over and over again to avoid
allocation.

Post by Janosch Machowinski
Just out of curiosity, is there a way in typelib, to get a preinitialized
memory area for a type containing a virtual method ?

No. Typelib is strictly limited to non-polymorphic types. Moreover, pocolog
should never assume that the marshalled type has the same layout than the
actual C++ type as we want to be able to read / upgrade old log files. The
compatibility tests are done only when we want to hit a C++ layer (RTT or
Qt)

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/0c994045/attachment.htm

Sylvain Joyeux

2014-06-08 20:27:50 UTC

Could you post the result of running pocolog on your test logfiles ? Last
time I looked, low-level access in pocolog was *not* the performance hog,
the stream aligner was (which -- I guess -- means there is a bug
somewhere). I'd like to verify that.

Sylvain

On Sun, Jun 8, 2014 at 11:01 AM, Sylvain Joyeux <bir.sylvain at gmail.com>

Post by Sylvain Joyeux
On Sat, Jun 7, 2014 at 11:13 PM, Janosch Machowinski <

Post by Janosch Machowinski
Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?

What do you call "packed" and "unpacked" there ? /me very confused
Sylvain

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140608/c07deb88/attachment.htm

Sylvain Joyeux

2014-06-08 21:16:01 UTC

Oh ... I would also need the size of each stream's sample (including the
size of vectors if there are any) ...

About index loading: the way the index was marshalled needed to be changed
(but was not) after the change you made to indexing (i.e. making indexes
dense). A 3-line patch improves performance quite a lot already. Alignment
is already pretty good on my test file (~4s).

I've generated a 6G log file with 3 streams, ~650k samples in 1 minute of
logical time, reading the streams using the Ruby stream aligner yields
~115x if I don't read the data sample, ~8x if I do. To me, this definitely
means that the performance problem is not pocolog's low-level handling but
both I/O (I have "images" of 1M, so I/O is actually a big performance
problem), typelib demarshalling (there is one thing in this code path that
sucks big time currently in the typelib-ruby binding related to
demarshalling, did not measure the actual impact of it, though) and the
fact that I hit the ruby GC pretty hard by creating 650k samples and 650k
intermediate buffers.

The point is: as soon as you'll get ruby bindings, you'll have the same
issues (GC, typelib demarshalling and I/O will not go away). The only thing
that goes away is the creation of 650k intermediate buffers, but that can
be removed with very little work already.

Now, I do get 8x replay speed when demarshalling the data. That's very far
from 0.6. When you talk about bad performance, were you using the replay UI
?

For reference:

pocolog test.0.log < this is a 5.9G file

Stream images [/images] < these are 1M samples
5994 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014 22:46:41
[0:09:59.299]
Stream motors [/motors] < these are 100 bytes samples
599708 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014 22:46:41
[0:09:59.706]
Stream pose_samples [/pose_samples] < these are 512 bytes samples
59970 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014 22:46:41
[0:09:59.690]

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140608/c3404cdd/attachment.htm

Janosch Machowinski

2014-06-09 10:04:13 UTC

Post by Sylvain Joyeux
Oh ... I would also need the size of each stream's sample (including
the size of vectors if there are any) ...
About index loading: the way the index was marshalled needed to be
changed (but was not) after the change you made to indexing (i.e.
making indexes dense). A 3-line patch improves performance quite a lot
already. Alignment is already pretty good on my test file (~4s).

It gets worse with the number of streams. Try a testcase with ~60
streams. There the performance really
drops, and this is the 'reality' test case...

Post by Sylvain Joyeux
I've generated a 6G log file with 3 streams, ~650k samples in 1 minute
of logical time, reading the streams using the Ruby stream aligner
yields ~115x if I don't read the data sample, ~8x if I do. To me, this
definitely means that the performance problem is not pocolog's
low-level handling but both I/O (I have "images" of 1M, so I/O is
actually a big performance problem), typelib demarshalling (there is
one thing in this code path that sucks big time currently in the
typelib-ruby binding related to demarshalling, did not measure the
actual impact of it, though) and the fact that I hit the ruby GC
pretty hard by creating 650k samples and 650k intermediate buffers.

To summon it up, the c++ implementation still outperforms the ruby
implementation by a factor of 4 in replay speed.

Post by Sylvain Joyeux
The point is: as soon as you'll get ruby bindings, you'll have the
same issues (GC, typelib demarshalling and I/O will not go away). The
only thing that goes away is the creation of 650k intermediate
buffers, but that can be removed with very little work already.

So, what are you saying ? We should not go for ruby bindings any more ?

Post by Sylvain Joyeux
Now, I do get 8x replay speed when demarshalling the data. That's very
far from 0.6. When you talk about bad performance, were you using the
replay UI ?

Replay scrips and UI. Using rock-replay is a pain right now.

Post by Sylvain Joyeux
pocolog test.0.log < this is a 5.9G file
Stream images [/images] < these are 1M samples
5994 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014 22:46:41
[0:09:59.299]
Stream motors [/motors] < these are 100 bytes samples
599708 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014
22:46:41 [0:09:59.706]
Stream pose_samples [/pose_samples] < these are 512 bytes samples
59970 samples from Sun 08/06/2014 22:36:42 to Sun 08/06/2014
22:46:41 [0:09:59.690]
Sylvain

Do you got a real logset somewhere for tests ?
Should I upload one tomorrow ?
Greetings
Janosch

Sylvain Joyeux

2014-06-09 14:24:47 UTC

On Mon, Jun 9, 2014 at 12:04 PM, Janosch Machowinski <

Post by Sylvain Joyeux
Oh ... I would also need the size of each stream's sample (including the

Post by Sylvain Joyeux
size of vectors if there are any) ...
About index loading: the way the index was marshalled needed to be
changed (but was not) after the change you made to indexing (i.e. making
indexes dense). A 3-line patch improves performance quite a lot already.
Alignment is already pretty good on my test file (~4s).

It gets worse with the number of streams. Try a testcase with ~60 streams.
There the performance really
drops, and this is the 'reality' test case...

Created a dataset of one minute with 100 streams. Each stream is at 100Hz,
so that's 600k samples. It took 4.6 seconds to generate the index and 0.8
seconds to load the file index (from warm cache, so with probably little
I/O overhead).

C++ *is* faster. Of course it is. From what I see, not fast enough to
justify the refactoring that you are proposing.

Would be a lot more interesting to find out why using Vizkit and log
control kills performance so much and how we could optimize the typelib
parts (which are C++ already !)

Again, you are *not* giving the right measurements. Speed factors and
durations are meaningless if we don't know how many samples each stream
has, and how long each stream lasts. Just "it is 24x times faster" means
nothing.

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/12ecdbd7/attachment.htm

Janosch Machowinski

2014-06-09 14:47:03 UTC

Post by Sylvain Joyeux
On Mon, Jun 9, 2014 at 12:04 PM, Janosch Machowinski
Oh ... I would also need the size of each stream's sample
(including the size of vectors if there are any) ...
About index loading: the way the index was marshalled needed
to be changed (but was not) after the change you made to
indexing (i.e. making indexes dense). A 3-line patch improves
performance quite a lot already. Alignment is already pretty
good on my test file (~4s).
It gets worse with the number of streams. Try a testcase with ~60
streams. There the performance really
drops, and this is the 'reality' test case...
Created a dataset of one minute with 100 streams. Each stream is at
100Hz, so that's 600k samples. It took 4.6 seconds to generate the
index and 0.8 seconds to load the file index (from warm cache, so with
probably little I/O overhead).

How long did the stream alignment take ? This is the part were usually
the problem is, as you can't get better than
O((log n)*s) there, were n is the number of streams and s the amount of
samples.

Post by Sylvain Joyeux
C++ *is* faster. Of course it is. From what I see, not fast enough to
justify the refactoring that you are proposing.

Ohh yes, it does. Recently I did a log of localization debugging. You
can't jump data in this case (and a lot of other usecases too) which
means you have to replay the whole logstream. If the replay is double as
fast, it means you need half
the time for debugging. So in my eyes it is 100% worth the effort.

Post by Sylvain Joyeux
Would be a lot more interesting to find out why using Vizkit and log
control kills performance so much and how we could optimize the
typelib parts (which are C++ already !)

One step after another...

Post by Sylvain Joyeux
Again, you are *not* giving the right measurements. Speed factors and
durations are meaningless if we don't know how many samples each
stream has, and how long each stream lasts. Just "it is 24x times
faster" means nothing.

You got the C++ implementation, just run multiIndexTester on your
testdata and compare the results.
Greetings
Janosch

Sylvain Joyeux

2014-06-09 15:04:50 UTC

Post by Sylvain Joyeux
Created a dataset of one minute with 100 streams. Each stream is at
100Hz, so that's 600k samples. It took 4.6 seconds to generate the index
and 0.8 seconds to load the file index (from warm cache, so with probably
little I/O overhead).

How long did the stream alignment take ? This is the part were usually the
problem is, as you can't get better than
O((log n)*s) there, were n is the number of streams and s the amount of
samples.

??? What are you talking about ? This is only the asymptotic curve. The
alignment takes 4.6 seconds.

Post by Sylvain Joyeux
C++ *is* faster. Of course it is. From what I see, not fast enough to
justify the refactoring that you are proposing.

Ohh yes, it does. Recently I did a log of localization debugging. You
can't jump data in this case (and a lot of other usecases too) which means
you have to replay the whole logstream. If the replay is double as fast, it
means you need half
the time for debugging. So in my eyes it is 100% worth the effort.

Except that making twice as fast the part that is currently taking 10% of
the replay time only will make the overall process 5% faster. Even making
it 100 times faster will only save 9%. This is from what I see what you are
attempting, as what takes the most time is I/O and typelib demarshalling.

In other words: you are attempting to optimize something without having
done any profiling. This is a cardinal sin.

Again, you are *not* giving the right measurements. Speed factors and

Post by Sylvain Joyeux
durations are meaningless if we don't know how many samples each stream
has, and how long each stream lasts. Just "it is 24x times faster" means
nothing.

You got the C++ implementation, just run multiIndexTester on your testdata
and compare the results.

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/87b821c6/attachment.htm

Janosch Machowinski

2014-06-09 17:08:42 UTC

Seriously, got to hell.
You know what s the problem with you, whenever one is motivated,
to fix a problem with rock and it not done the way you think it is the
best, you start bashing around. Btw, a correct response would be, great,
you looked into it and made it faster, go on.

A lot of people think that the log replay tools are slow as hell, and that
it is annoying to wait 20 seconds every time rock replay starts up.

Just to finish this, it is slow as hell, I just did tests with logdata I
generated
using Mars :
time rock-replay ~/Arbeit/asguard/bundles/asguard/logs/current
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
Aligning streams. This can take a long time
pocolog.rb[INFO]: Got 77 streams with 295166 samples
pocolog.rb[INFO]: Stream Aligner index created

real 0m20.860s
user 0m19.605s
sys 0m1.008s

time ./multiIndexer ~/Arbeit/asguard/bundles/asguard/logs/current/*.log
Building multi file index
100% Done
Processed 295169 of 295169 samples

real 0m1.089s
user 0m0.780s
sys 0m0.304s

This is a hug speedup, and it is worth it.
Janosch

Post by Sylvain Joyeux
Created a dataset of one minute with 100 streams. Each stream
is at 100Hz, so that's 600k samples. It took 4.6 seconds to
generate the index and 0.8 seconds to load the file index
(from warm cache, so with probably little I/O overhead).
How long did the stream alignment take ? This is the part were
usually the problem is, as you can't get better than
O((log n)*s) there, were n is the number of streams and s the
amount of samples.
??? What are you talking about ? This is only the asymptotic curve.
The alignment takes 4.6 seconds.
C++ *is* faster. Of course it is. From what I see, not fast
enough to justify the refactoring that you are proposing.
Ohh yes, it does. Recently I did a log of localization debugging.
You can't jump data in this case (and a lot of other usecases too)
which means you have to replay the whole logstream. If the replay
is double as fast, it means you need half
the time for debugging. So in my eyes it is 100% worth the effort.
Except that making twice as fast the part that is currently taking 10%
of the replay time only will make the overall process 5% faster. Even
making it 100 times faster will only save 9%. This is from what I see
what you are attempting, as what takes the most time is I/O and
typelib demarshalling.
In other words: you are attempting to optimize something without
having done any profiling. This is a cardinal sin.
Again, you are *not* giving the right measurements. Speed
factors and durations are meaningless if we don't know how
many samples each stream has, and how long each stream lasts.
Just "it is 24x times faster" means nothing.
You got the C++ implementation, just run multiIndexTester on your
testdata and compare the results.
Sylvain

-------------- n?chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/41c58f39/attachment-0001.htm

Alexander Duda

2014-06-09 17:46:45 UTC

Post by Janosch Machowinski
Seriously, got to hell.
You know what s the problem with you, whenever one is motivated,
to fix a problem with rock and it not done the way you think it is the
best, you start bashing around. Btw, a correct response would be, great,
you looked into it and made it faster, go on.

Seriously, it is great that you are looking into it. But you are also asking other people to help you costing them quite a bit of their free time. Therefore, I think it is also fair for them to criticize specific steps which might have little impact on the overall replay speed.

At the beginning, you were complaining about the replay speed and not about the loading and alignment which you are currently improving. And like Sylvain said. I also do not believe that the replay speed is so slow because of the current pocolog implementation. It is slow because of how rock-replay works (at least one callback per sample etc.).

Anyway, I am a bit puzzled why you have different number of samples for the same log files.

Greets Alex

Post by Janosch Machowinski
A lot of people think that the log replay tools are slow as hell, and that
it is annoying to wait 20 seconds every time rock replay starts up.
Just to finish this, it is slow as hell, I just did tests with logdata I generated
time rock-replay ~/Arbeit/asguard/bundles/asguard/logs/current
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
pocolog.rb[INFO]: building index ...
pocolog.rb[INFO]: done
Aligning streams. This can take a long time
pocolog.rb[INFO]: Got 77 streams with 295166 samples
pocolog.rb[INFO]: Stream Aligner index created
real 0m20.860s
user 0m19.605s
sys 0m1.008s
time ./multiIndexer ~/Arbeit/asguard/bundles/asguard/logs/current/*.log
Building multi file index
100% Done
Processed 295169 of 295169 samples
real 0m1.089s
user 0m0.780s
sys 0m0.304s
This is a hug speedup, and it is worth it.
Janosch

Created a dataset of one minute with 100 streams. Each stream is at 100Hz, so that's 600k samples. It took 4.6 seconds to generate the index and 0.8 seconds to load the file index (from warm cache, so with probably little I/O overhead).
How long did the stream alignment take ? This is the part were usually the problem is, as you can't get better than
O((log n)*s) there, were n is the number of streams and s the amount of samples.
??? What are you talking about ? This is only the asymptotic curve. The alignment takes 4.6 seconds.
C++ *is* faster. Of course it is. From what I see, not fast enough to justify the refactoring that you are proposing.
Ohh yes, it does. Recently I did a log of localization debugging. You can't jump data in this case (and a lot of other usecases too) which means you have to replay the whole logstream. If the replay is double as fast, it means you need half
the time for debugging. So in my eyes it is 100% worth the effort.
Except that making twice as fast the part that is currently taking 10% of the replay time only will make the overall process 5% faster. Even making it 100 times faster will only save 9%. This is from what I see what you are attempting, as what takes the most time is I/O and typelib demarshalling.
In other words: you are attempting to optimize something without having done any profiling. This is a cardinal sin.
Again, you are *not* giving the right measurements. Speed factors and durations are meaningless if we don't know how many samples each stream has, and how long each stream lasts. Just "it is 24x times faster" means nothing.
You got the C++ implementation, just run multiIndexTester on your testdata and compare the results.
Sylvain

_______________________________________________
Rock-dev mailing list
Rock-dev at dfki.de
http://www.dfki.de/mailman/cgi-bin/listinfo/rock-dev

--
Dipl.-Ing. Alexander Duda
Unterwasserrobotik
Robotics Innovation Center

Hauptgesch?ftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Stra?e 1
28359 Bremen, Germany

Tel.: +49 421 178 45-6620
Zentrale: +49 421 178 45-0
Fax: +49 421 178 45-4150 (Faxe bitte namentlich kennzeichnen)
E-Mail: Alexander.Duda at dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Stra?e 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/673/0060/3

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/b33b6c96/attachment.htm

Janosch Machowinski

2014-06-09 19:51:39 UTC

Post by Alexander Duda
Seriously, it is great that you are looking into it. But you are also
asking other people to help you costing them quite a bit of their free
time. Therefore, I think it is also fair for them to criticize
specific steps which might have little impact on the overall replay speed.
At the beginning, you were complaining about the replay speed and not
about the loading and alignment which you are currently improving. And
like Sylvain said. I also do not believe that the replay speed is so
slow because of the current pocolog implementation. It is slow because
of how rock-replay works (at least one callback per sample etc.).

Yep, the replay speed is still my main concern, but while fixing this, I
thought one could
catch the annoying index time on the way as well....
As removing streams from the replay increased the replay time (using
trace(false)) I am
pretty much suspecting the streamAligner... (and was right, see below)

And here is the point i criticize about the bashing from sylvain, he
didn't have a point
at all, and started bashing me for no reason. I have good reasons for
the things I do, and
I really hate it that I need to defend every step I make.

Post by Alexander Duda
Anyway, I am a bit puzzled why you have different number of samples for the same log files.

The log file is truncated, my implementation does not check for
truncates at the end of the stream atm.

Something in the pocolog implementation is the problem :
Using only polocog stream aligner to replay all samples
(I hacked the log implementation, to get the streamAligner directly,
and then called step on it in a while loop):
time ./test.rb ~/Arbeit/asguard/bundles/asguard/logs/current/
Replaying all samples
Orocos[WARN]: No ports are selected. Assuming that all ports shall be
replayed.
Orocos[WARN]: Connect port(s) or set their track flag to true to get rid
of this message.
pocolog.rb[INFO]: Got 79 streams with 263646 samples
pocolog.rb[INFO]: Stream Aligner index created
Done

real 0m56.849s
user 0m54.927s
sys 0m1.832s

Using the Cpp implementation including type unmashaling :
time ./multiIndexer ~/Arbeit/asguard/bundles/asguard/logs/current/*.log
Building multi file index
100% Done
Processed 263647 of 263647 samples
Replaying all samples
99% DoneCould not load sample data of sample 8211 of stream
/laser_filter_front.filtered_scans stream size 8212
First sample time 20140609-21:28:33:145982 last sample time
20140609-21:29:41:690300
Took 10.137.364 realtime 68.544.318

real 0m10.864s
user 0m9.189s
sys 0m1.660s

Typelib marshaling is not a big problem here, makes a difference of 2
seconds in user time.
Greetings
Janosch

-------------- n?chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/4aaa7f43/attachment.htm

Sylvain Joyeux

2014-06-09 21:39:35 UTC

Post by Janosch Machowinski
And here is the point i criticize about the bashing from sylvain, he
didn't have a point
at all, and started bashing me for no reason. I have good reasons for the
things I do, and
I really hate it that I need to defend every step I make.

For the record: I've made three mails in this whole thread that were
questioning your approach. In these three emails, 80% were giving facts
(basically, the benchmarking I've done) and the 20% of the rest were saying
that I believe that you are optimizing the wrong parts and explaining WHY I
believe so. Having re-read these emails (3 times), I really don't see any
bashing (but I've been told that I have a high threshold to consider
something bashing).

The most offensive statement is probably
In other words: you are attempting to optimize something without having
done any profiling. This is a cardinal sin.
Which is true and true:
- you would have shown some detailed profiling of pocolog runs if it was
otherwise
- optimizing without profiling is like debugging by looking at the code:
trying to do science without doing any experiment ever.

real 0m56.849s

Post by Janosch Machowinski
user 0m54.927s
sys 0m1.832s

Are you *even* interested in the fact that I replay twice as many samples
in more streams with 1/4th of the time on my machine ? That's what bothers
me here (and why I think you're looking at the wrong angle). It would leave
75% left to optimize after your changes. Or maybe my benchmarking is wrong.
You want to handwave it away ? Your choice to make ! But simply say so
instead of calling me names.

You've measured that typelib demarshalling is not the issue ? Great, you
proved me wrong there, I can have a look elsewhere (Mea culpa: I did affirm
in a previous email that it was the problem without having profiled it yet.
Having done some real profiling today, I could have told you that it was
indeed not the main issue). Now, it does *not* remove the fact that I
replay (with a clean pure-pocolog script) a lot faster that yours do. The
question I am asking myself is why. It seems that I should be ashamed that
I do.

Anyways, this closes the issue for me. F.. do whatever you want.
Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/bc86b119/attachment.htm

Sylvain Joyeux

2014-06-09 20:08:05 UTC

Post by Alexander Duda
Seriously, it is great that you are looking into it. But you are also
asking other people to help you costing them quite a bit of their free time.

To be fair to Janosch, answering his questions (and only them) would have
cost me close to zero time. If I end up spending a lot of time here, it's
because I wished to.

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/f7e03e29/attachment.htm

Sylvain Joyeux

2014-06-09 20:06:57 UTC

On Mon, Jun 9, 2014 at 7:08 PM, Janosch Machowinski <

Post by Janosch Machowinski
Seriously, got to hell.

Sure, will do ...

Post by Janosch Machowinski
You know what s the problem with you, whenever one is motivated,
to fix a problem with rock and it not done the way you think it is the
best, you start bashing around. Btw, a correct response would be, great,
you looked into it and made it faster, go on.

Hey. You've not *yet* made it faster. You will have made it faster once
you've provided the same level of functionality than the current
application. Which means developing a Ruby binding, adapting pocolog and --
if it ends up this way -- adapting the code that uses pocolog. Half an hour
profiling pocolog showed me quite a lot that can be optimized there with
small, non-intrusive patches.

I am trying to find out whether you *are* wasting your time or not. That's
what benchmarking and profiling is about. At this point, the right answer
would have been "oups, weird that you don't get the same numbers than I do
at all, let's look deeper", not some misplaced sense of pride. For what it
is worth, you can waste your time for all I care. If you end up not
improving the situation, *I* won't have lost anything and if you do I gain
something. Win-win for me.

Now, I'd very much like to be proven wrong (yay, faster replay !). I just
wonder why you take somebody that tells you "hey, I've measured pocolog
performance and it is not as bad as you claim it to be" for a personal
attack. Look at the "test" you sent and tell me the many things that are
wrong with it. There are subtle issues with benchmarking in general, but in
this case subtelty is not really the main problem.

Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/rock-dev/attachments/20140609/2f1cdab8/attachment-0001.htm

Janosch Machowinski

2014-06-09 09:57:00 UTC

I
did test with indexing of single file only. The result were 12 Second
versus 300ms for a
100 MB log file. For the stream alignment of my test cases, I can't say
how long it
took with pocolog, I guess around 30 seconds (Can't retest, today is a
holiday in germany).
The C++ implementation did it in 2.4 seconds.
Greetings
Janosch

Post by Sylvain Joyeux
Could you post the result of running pocolog on your test logfiles ?
Last time I looked, low-level access in pocolog was *not* the
performance hog, the stream aligner was (which -- I guess -- means
there is a bug somewhere). I'd like to verify that.
Sylvain
On Sun, Jun 8, 2014 at 11:01 AM, Sylvain Joyeux <bir.sylvain at gmail.com
On Sat, Jun 7, 2014 at 11:13 PM, Janosch Machowinski
Anyway, giving out unpacked types, seems not to work,
as one needs the RTT typesystem for unpacking (if I
understood the code right...)
How would I create a packed typelib::Value on the C++
side ?
What do you call "packed" and "unpacked" there ? /me very confused
Sylvain

19 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Janosch Machowinski 2014-06-05 11:05:49 UTC

Sylvain Joyeux 2014-06-07 14:29:03 UTC

Janosch Machowinski 2014-06-07 21:13:59 UTC

Sylvain Joyeux 2014-06-08 09:01:30 UTC

Janosch Machowinski 2014-06-08 09:12:55 UTC

Janosch Machowinski 2014-06-08 19:35:25 UTC

Sylvain Joyeux 2014-06-09 20:10:53 UTC

Sylvain Joyeux 2014-06-08 20:27:50 UTC

Sylvain Joyeux 2014-06-08 21:16:01 UTC

Janosch Machowinski 2014-06-09 10:04:13 UTC

Sylvain Joyeux 2014-06-09 14:24:47 UTC

Janosch Machowinski 2014-06-09 14:47:03 UTC

Sylvain Joyeux 2014-06-09 15:04:50 UTC

Janosch Machowinski 2014-06-09 17:08:42 UTC

Alexander Duda 2014-06-09 17:46:45 UTC

Janosch Machowinski 2014-06-09 19:51:39 UTC

Sylvain Joyeux 2014-06-09 21:39:35 UTC

Sylvain Joyeux 2014-06-09 20:08:05 UTC

Sylvain Joyeux 2014-06-09 20:06:57 UTC

Janosch Machowinski 2014-06-09 09:57:00 UTC

about - legalese

Loading...