Discussion:
[valgrind] [Bug 386945] New: Bogus memcheck errors on ppc64(le) when using strcmp() with gcc-7
Markus Trippelsdorf
2017-11-15 13:29:51 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Bug ID: 386945
Summary: Bogus memcheck errors on ppc64(le) when using strcmp()
with gcc-7
Product: valgrind
Version: 3.14 SVN
Platform: Other
URL: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: NOR
Component: memcheck
Assignee: ***@acm.org
Reporter: ***@yandex.com
Target Milestone: ---

***@gcc2-power8 ~ % cat example.c
#include <string.h>

int main ()
{
char str1[15];
char str2[15];
strcpy(str1, "abcdef");
strcpy(str2, "ABCDEF");
return strcmp(str1, str2);
}

***@gcc2-power8 ~ % gcc -O2 -g example.c
***@gcc2-power8 ~ % valgrind ./a.out
==28988== Memcheck, a memory error detector
==28988== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28988== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==28988== Command: ./a.out
==28988==
==28988== Conditional jump or move depends on uninitialised value(s)
==28988== at 0x100004E8: main (example.c:9)

A new strcmp optimization was added to gcc-7. Since then valgrind
shows the bogus errors. So some kind of suppression would be needed.
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-11-15 13:53:44 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #1 from Markus Trippelsdorf <***@yandex.com> ---
Created attachment 108876
--> https://bugs.kde.org/attachment.cgi?id=108876&action=edit
assembler code

Also happens with v3.13:

==142441== Memcheck, a memory error detector
==142441== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==142441== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==142441== Command: ./a.out
==142441==
==142441== Conditional jump or move depends on uninitialised value(s)
==142441== at 0x10000508: main (example.c:9)

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479#c10 for analysis.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-16 19:49:07 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #2 from Julian Seward <***@acm.org> ---
This is caused by insufficiently precise definedness analysis for
(1) count leading zero operations (Iop_Clz32, Iop_Clz64)
(2) PPC integer compares (Iop_CmpORD family)
(3) integer subtract

Here's an initial, hacky patch that fixes (1) and (2). Along with
use of the flag --expensive-definedness-checks=yes, which fixes (3),
it makes the test case run quiet.

Can you give it a try on something larger?
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-16 19:49:52 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #3 from Julian Seward <***@acm.org> ---
Created attachment 108905
--> https://bugs.kde.org/attachment.cgi?id=108905&action=edit
Initial attempt at a fix
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-16 19:52:23 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Julian Seward <***@acm.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #108905|0 |1
is obsolete| |

--- Comment #4 from Julian Seward <***@acm.org> ---
Created attachment 108906
--> https://bugs.kde.org/attachment.cgi?id=108906&action=edit
Initial attempt at a fix

Second attempt, attaching the correct patch this time.
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-11-16 21:24:40 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #5 from Markus Trippelsdorf <***@yandex.com> ---
(In reply to Julian Seward from comment #2)
Post by Julian Seward
This is caused by insufficiently precise definedness analysis for
(1) count leading zero operations (Iop_Clz32, Iop_Clz64)
(2) PPC integer compares (Iop_CmpORD family)
(3) integer subtract
Here's an initial, hacky patch that fixes (1) and (2). Along with
use of the flag --expensive-definedness-checks=yes, which fixes (3),
it makes the test case run quiet.
Can you give it a try on something larger?
Unfortunately it does't work yet.
I still get "Invalid read of size 4" and "Conditional jump or move depends on
uninitialised value(s)" on many strcmp() invokations.
Plus the patch corrupts memory and gcc ICEs now.
(Testcase is running g++ under valgrind when compiling a small C++ program.)
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-17 11:18:29 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Julian Seward <***@acm.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #108906|0 |1
is obsolete| |

--- Comment #6 from Julian Seward <***@acm.org> ---
Created attachment 108913
--> https://bugs.kde.org/attachment.cgi?id=108913&action=edit
Next attempt at a fix

* fixes crashing (from my buggy implementation of cntlzd)
* removes all (I think) invalid read warnings (I reimplemented ldbrx)

Still invalidly reports conditional branches on uninit data. Something
is not right somewhere.

Really, how Valgrind handles integer conditional branches on POWER
isn't good (too complex, hard to analyse) and should be reimplemented.
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-11-17 12:15:16 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #7 from Markus Trippelsdorf <***@yandex.com> ---
Thanks.
The gcc crash is gone, but I still get lots of invalid read warnings.

However the amount of errors is much lower now:

From (valgrind trunk):
ERROR SUMMARY: 89817 errors from 298 contexts
To (your patch and --expensive-definedness-checks=yes):
ERROR SUMMARY: 6495 errors from 85 contexts

BTW valgrind trunk (without any patch) doesn't like
--expensive-definedness-checks=yes:

==64698== Memcheck, a memory error detector
==64698== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==64698== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright
info
==64698== Command:
/home/trippels/gcc_test/usr/local/bin/../libexec/gcc/powerpc64le-unknown-linux-gnu/8.0.0/cc1plus
-fpreprocessed bench.ii -quiet -dumpbase bench.cpp -auxbase ben
ch -O2 -version -o bench.s
==64698==

vex: priv/host_ppc_isel.c:1531 (iselWordExpr_R_wrk): Assertion `0' failed.
vex storage: T total 666718712 bytes allocated
vex storage: P total 192 bytes allocated

valgrind: the 'impossible' happened:
LibVEX called failure_exit().

host stacktrace:
==64698== at 0x58084148: show_sched_status_wrk (m_libcassert.c:355)
==64698== by 0x5808430F: report_and_quit (m_libcassert.c:426)
==64698== by 0x5808458B: panic (m_libcassert.c:502)
==64698== by 0x5808458B: vgPlain_core_panic_at (m_libcassert.c:507)
==64698== by 0x580845DB: vgPlain_core_panic (m_libcassert.c:512)
==64698== by 0x580B1987: failure_exit (m_translate.c:751)
==64698== by 0x581CBA7B: vex_assert_fail (main_util.c:247)
==64698== by 0x5825AC83: iselWordExpr_R_wrk (host_ppc_isel.c:1531)
==64698== by 0x5825AC83: iselWordExpr_R (host_ppc_isel.c:1404)
==64698== by 0x5825E48F: iselWordExpr_RH_wrk (host_ppc_isel.c:2756)
==64698== by 0x5825E48F: iselWordExpr_RH (host_ppc_isel.c:2704)
==64698== by 0x58259B6B: iselWordExpr_R_wrk (host_ppc_isel.c:1477)
==64698== by 0x58259B6B: iselWordExpr_R (host_ppc_isel.c:1404)
==64698== by 0x58259E6B: iselWordExpr_R_wrk (host_ppc_isel.c:1505)
==64698== by 0x58259E6B: iselWordExpr_R (host_ppc_isel.c:1404)
==64698== by 0x5825E48F: iselWordExpr_RH_wrk (host_ppc_isel.c:2756)
==64698== by 0x5825E48F: iselWordExpr_RH (host_ppc_isel.c:2704)
==64698== by 0x58259AFB: iselWordExpr_R_wrk (host_ppc_isel.c:1481)
==64698== by 0x58259AFB: iselWordExpr_R (host_ppc_isel.c:1404)
==64698== by 0x58258363: iselCondCode_wrk (host_ppc_isel.c:2965)
==64698== by 0x5826AAF7: iselCondCode (host_ppc_isel.c:2915)
==64698== by 0x5826AAF7: iselStmt (host_ppc_isel.c:6385)
==64698== by 0x5826AAF7: iselSB_PPC (host_ppc_isel.c:6983)
==64698== by 0x581C859F: libvex_BackEnd (main_main.c:1049)
==64698== by 0x581C859F: LibVEX_Translate (main_main.c:1186)
==64698== by 0x580B5323: vgPlain_translate (m_translate.c:1805)
==64698== by 0x58110E9F: handle_chain_me (scheduler.c:1084)
==64698== by 0x58113FFF: vgPlain_scheduler (scheduler.c:1428)
==64698== by 0x5812F697: thread_wrapper (syswrap-linux.c:103)
==64698== by 0x5812F697: run_a_thread_NORETURN (syswrap-linux.c:156)

sched status:
running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 64698)
==64698== at 0x10B22120: equal (hash-traits.h:221)
==64698== by 0x10B22120: equal_keys (hash-map-traits.h:57)
==64698== by 0x10B22120: equal (hash-map.h:44)
==64698== by 0x10B22120: hash_table<hash_map<nofree_string_hash, opt_pass*,
simple_hashmap_traits<default_hash_traits<nofree_string_hash>, opt_pass*>
::hash_entry, xcallocator
::find_with_hash(char const* const&, unsigned int) (hash-table.h:846)
==64698== by 0x10B18C33: get (hash-map.h:161)
==64698== by 0x10B18C33: gcc::pass_manager::register_pass_name(opt_pass*,
char const*) (passes.c:864)
==64698== by 0x10B19433:
gcc::pass_manager::register_one_dump_file(opt_pass*) (passes.c:834)
==64698== by 0x10B19567: gcc::pass_manager::register_dump_files(opt_pass*)
(passes.c:846)
==64698== by 0x10B195A7: gcc::pass_manager::register_dump_files(opt_pass*)
(passes.c:849)
==64698== by 0x10B195A7: gcc::pass_manager::register_dump_files(opt_pass*)
(passes.c:849)
==64698== by 0x10B1F7EF: gcc::pass_manager::pass_manager(gcc::context*)
(passes.c:1615)
==64698== by 0x101C220B: general_init (toplev.c:1168)
==64698== by 0x101C220B: toplev::main(int, char**) (toplev.c:2148)
==64698== by 0x101C51A7: main (main.c:39)
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-17 13:49:29 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #8 from Julian Seward <***@acm.org> ---
(In reply to Markus Trippelsdorf from comment #7)
Post by Markus Trippelsdorf
Thanks.
The gcc crash is gone, but I still get lots of invalid read warnings.
ERROR SUMMARY: 89817 errors from 298 contexts
ERROR SUMMARY: 6495 errors from 85 contexts
I re-checked my fixes. I cannot figure out why V still reports undefined
value errors now. Can you point me at the bit of the gcc source that generates
this code? Maybe that has some further explanation of how this should work.
Post by Markus Trippelsdorf
I still get lots of invalid read warnings.
I get those too (see below). These are misaligned (non-8-byte-aligned)
ldbrx instructions (I checked) as part of this inlined strcmp. Is the
strcmp-inlining expected to produce misaligned loads, or not?
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-17 14:05:53 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #9 from Julian Seward <***@acm.org> ---
Misaligned loads as referred to in comment 8:

==63134== Invalid read of size 8
==63134== at 0xB163FE4: process_symtab (lto-plugin.c:905)
==63134== by 0xB16D897: simple_object_elf_find_sections
(simple-object-elf.c:570)
==63134== by 0xB16BAF7: simple_object_find_sections (simple-object.c:194)
==63134== by 0xB1645A7: claim_file_handler (lto-plugin.c:1009)
==63134== by 0x1003A1BF: ??? (in /usr/bin/ld)
==63134== by 0x10035F9F: ??? (in /usr/bin/ld)
==63134== Address 0x4797413 is 243 bytes inside a block of size 249 alloc'd
==63134== at 0x4083F40: malloc (vg_replace_malloc.c:299)
==63134== by 0x41EC7FB: xmalloc (in
/usr/lib64/libbfd-2.25.1-32.base.el7_4.1.so)
==63134== by 0xB16D7CF: simple_object_elf_find_sections
(simple-object-elf.c:535)
==63134== by 0xB16BAF7: simple_object_find_sections (simple-object.c:194)
==63134== by 0xB1645A7: claim_file_handler (lto-plugin.c:1009)
==63134== by 0x1003A1BF: ??? (in /usr/bin/ld)
==63134== by 0x10035F9F: ??? (in /usr/bin/ld)

==63134==
==63134== Invalid read of size 8
==63134== at 0xB1639F4: process_offload_section (lto-plugin.c:952)
==63134== by 0xB16D897: simple_object_elf_find_sections
(simple-object-elf.c:570)
==63134== by 0xB16BAF7: simple_object_find_sections (simple-object.c:194)
==63134== by 0xB164757: claim_file_handler (lto-plugin.c:1022)
==63134== by 0x1003A1BF: ??? (in /usr/bin/ld)
==63134== by 0x10035F9F: ??? (in /usr/bin/ld)
==63134== Address 0x4797c53 is 243 bytes inside a block of size 249 alloc'd
==63134== at 0x4083F40: malloc (vg_replace_malloc.c:299)
==63134== by 0x41EC7FB: xmalloc (in
/usr/lib64/libbfd-2.25.1-32.base.el7_4.1.so)
==63134== by 0xB16D7CF: simple_object_elf_find_sections
(simple-object-elf.c:535)
==63134== by 0xB16BAF7: simple_object_find_sections (simple-object.c:194)
==63134== by 0xB164757: claim_file_handler (lto-plugin.c:1022)
==63134== by 0x1003A1BF: ??? (in /usr/bin/ld)
==63134== by 0x10035F9F: ??? (in /usr/bin/ld)
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-11-17 14:40:36 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #10 from Markus Trippelsdorf <***@yandex.com> ---
For some reason I cannot CC Aaron Sawdey, who wrote the PPC strcmp patch:
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=244598
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2017-11-17 20:40:29 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #11 from Julian Seward <***@acm.org> ---
(In reply to Markus Trippelsdorf from comment #10)
I can't move this along without further input. Maybe Aaron doesn't have
an account at the KDE bugzilla.

In particular I'd like to understand why there are misaligned loads happening.
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-11-18 06:31:41 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #12 from Markus Trippelsdorf <***@yandex.com> ---
(In reply to Julian Seward from comment #11)
Post by Julian Seward
(In reply to Markus Trippelsdorf from comment #10)
I can't move this along without further input. Maybe Aaron doesn't have
an account at the KDE bugzilla.
In particular I'd like to understand why there are misaligned loads
happening.
As already mentioned above, Aaron describes his algorithm here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479#c10
Quote: »If both strings have alignment > 8 we cannot inadvertently cross a page
boundary doing 8B loads. For any argument that has smaller alignment, it emits
a runtime check to see if the inline code would cross a 4k boundary. If so, the
library function call is used instead of the inline code.«
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2017-11-18 13:51:54 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Aaron Sawdey <***@linux.vnet.ibm.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@linux.vnet.ibm.com

--- Comment #13 from Aaron Sawdey <***@linux.vnet.ibm.com> ---
The unaligned ldbrx are expected -- there is not much penalty for doing this on
power7/power8 so it's faster for these short comparisons to just do it possibly
unaligned rather than have the extra branches and code to do a partial
comparison to get one/both arguments aligned.

Targeting power8 we can generate overlapping unaligned ldbrx to avoid having to
do multiple smaller loads or a shift/mask to compare something less than a
doubleword at the end. Power7 doesn't like that so there we do the shift/mask.

Doing the full doubleword compare first means that in the equality case I can
check for zero byte with just one cmpb. In the case where the doublewords are
not equal then two cmpb are needed, one to compare the string data and the
other to check just one of them for a zero byte.

The code that generates this is in expand_strn_compare() in rs6000.c (gcc 7)
or rs6000-string.c (gcc 8).
--
You are receiving this mail because:
You are watching all bug changes.
Markus Trippelsdorf
2017-12-11 07:29:36 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Markus Trippelsdorf <***@yandex.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@us.ibm.com
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-01-05 15:43:40 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #14 from Julian Seward <***@acm.org> ---
Here's a program that computes the CR.{LT,GT,EQ} bits after "cmplw"
using just AND, OR, XOR, NOT, SUB, SHL and SHR. For "cmplw" (the
only case tested here so far), it produces the same results as the
hardware.

Since we have exact V-bit interpretations of all of the abovementioned
operations (in mc_translate.c), we should be able to mechanically
derive an exact V-bit interpretation of Iop_CmpORD32U. If we do
the same for Iop_CmpORD64U and their signed equivalents, we'll be
along way along the road to getting these false-positive problems
with POWER condition codes fixed.

Performance will probably be pretty horrible. But I am hoping that
there will be a lot of duplicated expressions in the resulting V-bit
DAG, so if those are commoned up it might not be quite so bad.

------------------------

#include <stdio.h>
#include <assert.h>

typedef unsigned int UInt;


__attribute__((noinline))
UInt h_CmpORD32U ( UInt x, UInt y )
{
UInt res;
__asm__ __volatile__(
"" "cmplw 0, %1, %2"
"\n\t" "mfcr %0"
: /*OUT*/ "=r"(res) : /*IN*/ "r"(x), "r"(y) : /*TRASH*/ "cc", "memory"
);
return (res >> 28) & 0xF;
}

__attribute__((noinline))
UInt s_CmpORD32U ( UInt x, UInt y )
{
#if 0
if (x < y) return 8;
if (x > y) return 4;
return 2;
#else
/* x-y unsignedly overflows
==> x < y
*/
UInt x_minus_y_UO = (~x & y) | ((~(x ^ y)) & (x-y));
x_minus_y_UO >>= 31;

UInt y_minus_x_UO = (~y & x) | ((~(y ^ x)) & (y-x));
y_minus_x_UO >>= 31;

//UInt res = (x_minus_y_UO << 3) | (y_minus_x_UO << 2);
//if (res == 0) res = 2;

UInt x_eq_y = (x_minus_y_UO | y_minus_x_UO) ^ 1;
UInt res = (x_minus_y_UO << 3) | (y_minus_x_UO << 2) | (x_eq_y << 1);

return res;
#endif
}

const UInt vals[] =
{ 0x0,
0x1,
0x2,

0x3ffffffeU,
0x3fffffffU,
0x40000000U,
0x40000001U,
0x40000002U,

0x7ffffffeU,
0x7fffffffU,
0x80000000U,
0x80000001U,
0x80000002U,

0xbffffffeU,
0xbfffffffU,
0xc0000000U,
0xc0000001U,
0xc0000002U,

0xfffffffdU,
0xfffffffeU,
0xffffffffU
};

int main ( void )
{
UInt xi, yi;
const UInt nElem = sizeof(vals) / sizeof(vals[0]);
for (xi = 0; xi < nElem; xi++) {
for (yi = 0; yi < nElem; yi++) {
UInt x = vals[xi];
UInt y = vals[yi];
UInt resHU = h_CmpORD32U(x, y);
UInt resSU = s_CmpORD32U(x, y);
printf("%08x %08x -> %02x %02x\n", x, y, resHU, resSU);
assert(resHU == resSU);
}
}
return 0;
}
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-02-27 16:21:51 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #15 from Julian Seward <***@acm.org> ---
I have a patch which I've been using for investigating this. It reduces the
noise level significantly, but doesn't remove it entirely. I'll post it in
a following comment. In the meantime I have a small failing testcase and a
question about acsawdey's strcmp implementation. Here's the testcase:

__attribute__((noinline))
char* setup() { return strndup("abcdef", 12); }

int main (void) {
char* x = setup();
return strcmp(x, "abcdef") == 0 ? 99 : 77;
}

|setup| is done out of line only so as to make the assembly for |main|
easier to follow.

Even with the abovementioned patch in place, Memcheck still reports a branch
on undefined values in |main|, in the inlined |strcmp|. This is when
compiled with gcc-7.3.0 at -O2.

I single-stepped through this with GDB attached to Valgrind, so I can look
at both the register values and what Memcheck thinks their definedness state
is, after every instruction. The important part of the trace follows. A
"." means the instruction was executed. ".nt" is "not taken" and ".t" is
"taken".

0000000010000450 <main>:
. 10000450: 03 10 40 3c lis r2,4099
. 10000454: 00 81 42 38 addi r2,r2,-32512
. 10000458: a6 02 08 7c mflr r0
. 1000045c: 10 00 01 f8 std r0,16(r1)
. 10000460: a1 ff 21 f8 stdu r1,-96(r1)
. 10000464: 25 03 00 48 bl 10000788 <setup+0x8>
. 10000468: fe ff 82 3c addis r4,r2,-2
. 1000046c: 78 88 84 38 addi r4,r4,-30600
. 10000470: 20 05 69 78 clrldi r9,r3,52
. 10000474: c0 0f a9 2f cmpdi cr7,r9,4032
.nt 10000478: 5c 01 9c 40 bge cr7,100005d4 <main+0x184>
. 1000047c: fe ff 42 3d addis r10,r2,-2
. 10000480: 28 1c 20 7d ldbrx r9,0,r3
. 10000484: 78 1b 68 7c mr r8,r3
. 10000488: 78 88 4a 39 addi r10,r10,-30600
. 1000048c: 28 54 40 7d ldbrx r10,0,r10

At this point, we've loaded r10 from presumably a constant pool, and it
contains "abcdef\0\0", all bytes defined. Also, we've loaded r9 from the
block allocated by strndup. It just so happens that r9 also now holds the
value "abcdef\0\0", but because that block is only 7 bytes long (as we
expect), that last \0 is marked as undefined.

. 10000490: 51 48 6a 7c subf. r3,r10,r9

Now r3 contains zero, because r10 == r9, but the lowest 8 bits of r3 are
marked as undefined, because the lowest 8 bits of r9 are undefined.

.t 10000494: 2c 00 82 41 beq 100004c0 <main+0x70> *********

At this beq, Memcheck issues as error. It believes -- correctly, I think --
that the branch depends on uninitialised data. Specifically, we are
comparing "abcdef\0\0" with "abcdef\0<undefined>", and since the 7 defined
bytes are identical, the branch actually depends on the undefined lowest
byte of r9.

10000498: 00 00 e0 38 li r7,0
1000049c: f8 53 28 7d cmpb r8,r9,r10
100004a0: f8 3b 27 7d cmpb r7,r9,r7
100004a4: 38 43 e8 7c orc r8,r7,r8
100004a8: 74 00 08 7d cntlzd r8,r8
100004ac: 08 00 08 39 addi r8,r8,8
100004b0: 30 46 23 79 rldcl r3,r9,r8,56
100004b4: 30 46 4a 79 rldcl r10,r10,r8,56
100004b8: 50 18 6a 7c subf r3,r10,r3
100004bc: 20 01 00 48 b 100005dc <main+0x18c>
. 100004c0: f8 1b 29 7d cmpb r9,r9,r3
. 100004c4: 00 00 a9 2f cmpdi cr7,r9,0
.t 100004c8: 14 01 9e 40 bne cr7,100005dc <main+0x18c>

So two questions:

(1) Does the above analysis seem correct?

(2) If so, am I correct to understand that the branch on uninitialised data
is intended, and that this is "fixed up" later on (perhaps beginning at
100004c0) so that the correct answer is nevertheless obtained?
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-02-27 17:14:26 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Julian Seward <***@acm.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #108913|0 |1
is obsolete| |

--- Comment #16 from Julian Seward <***@acm.org> ---
Created attachment 111056
--> https://bugs.kde.org/attachment.cgi?id=111056&action=edit
WIP patch, as described in comment 15
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-02-27 17:26:08 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #17 from Aaron Sawdey <***@linux.vnet.ibm.com> ---
Julian,
The analysis is ok. The code that follows on either branch looks for a zero
byte in the process of producing the final result.

What happens after the subf./beq is that a cmpb with 0 is done to check that we
are not going past the end of the string:

beq 0,.L19
.L13:
li 7,0
cmpb 8,9,10
cmpb 7,9,7
orc 8,7,8
cntlzd 8,8
addi 8,8,8
rldcl 3,9,8,56
rldcl 10,10,8,56
subf 3,10,3
b .L10
.L19:
cmpb 9,9,3
cmpdi 7,9,0
bne 7,.L10

If the beq is not taken, the cmpb/cmpb/orc sequence produces a result that has
FF in the byte position of either the first difference, or the first zero byte.
Then this identified byte is extracted from both strings and subtracted to
produce the final result.

We only need to look for zero in one string because: if one has zero and the
other does not, this will be a difference. If both have a zero byte in the same
position then comparing either one with all zeros will make that zero byte the
one that is extracted and subtracted to produce the final result.

If the beq is taken, then r3 contains zero and used to see if there is a zero
byte in r9 (which must be equal to r10). If there is a zero byte, then the
strings are equal and r3 is the result.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-02-27 17:41:40 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #18 from Julian Seward <***@acm.org> ---
(In reply to Aaron Sawdey from comment #17)

Aaron,

Thanks for the details. It's not clear though to me what the answer
to

(2) If so, am I correct to understand that the branch on uninitialised data
is intended,

is. Can you give give me a yes/no answer on that?
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-02-27 17:48:53 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #19 from Aaron Sawdey <***@linux.vnet.ibm.com> ---
Sorry if that was unclear. Yes, the branch on uninitialized data is intended
and is caught and fixed by the cmpb with 0 that happens on both paths.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-02-27 18:06:48 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #20 from Julian Seward <***@acm.org> ---
(In reply to Aaron Sawdey from comment #19)
Post by Aaron Sawdey
Yes, the branch on uninitialized data is intended
Thanks for the clarification.

Unfortunately, the analysis framework has the deeply wired-in assumption
that every conditional branch instruction is "important" for the final
outcome of the program. So there's no easy way (AFAIK, at least) to
fix it from the Memcheck side.

This isn't the first time I've seen this kind of thing -- a conditional
branch on uninitialised data, followed by suitable fixups afterwards.
I think gzip/zlib used to do this, but were subsequently modified so as
to not do that.

Despite some study of the problem I haven't come up with a way to solve
it. How practical is it for you to change the generated code so it
doesn't do that?

There's some related stuff in the talk at
https://fosdem.org/2018/schedule/event/debugging_tools_memcheck/.
See in particular the second half of slide 13.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-02-28 12:12:20 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #21 from Julian Seward <***@acm.org> ---
(In reply to Julian Seward from comment #16)
Created attachment 111056 [details]
WIP patch, as described in comment 15
I should comment that this patch will still report invalid reads at
the end of blocks (ends of strings in malloc'd storage), when they
are misaligned but do not cross a page boundary, as described in
comment 12. This would be easy enough to fix, if someone can get
me a test case.

Is the page size in the code hardwired at 4096? Or is it 4K for
4K page systems, 64K for 64K page systems, etc?
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-02-28 12:14:24 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #22 from Aaron Sawdey <***@linux.vnet.ibm.com> ---
It is always hardwired to avoid 4k boundary crossings.
--
You are receiving this mail because:
You are watching all bug changes.
Florian Weimer
2018-10-17 09:50:19 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Florian Weimer <***@redhat.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@redhat.com
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-10-17 10:04:44 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Mark Wielaard <***@klomp.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@klomp.org
Status|REPORTED |CONFIRMED
Ever confirmed|0 |1

--- Comment #23 from Mark Wielaard <***@klomp.org> ---
Any progress on this?
I am seeing more and more reports of this issue with GCC 8.1 on ppc64le.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-10-17 10:23:33 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #24 from Mark Wielaard <***@klomp.org> ---
Created attachment 115701
--> https://bugs.kde.org/attachment.cgi?id=115701&action=edit
Small ppc64le binary with inlined string functions

Here is an example with some inlined string functions on Fedora 28 ppc64le:

$ cat foo.c
#include <string.h>
#include <stdio.h>

__attribute__ ((weak)) void
do_test (const char *left, const char *right)
{
printf ("result: %d\n", strcmp (left, right));
}

int
main (void)
{
do_test (strdup ("a"), strdup ("b"));
}

$ gcc --version | head -1
gcc (GCC) 8.1.1 20180712 (Red Hat 8.1.1-5)
$ gcc -O2 -g -o foo foo.c

$ valgrind -q ./foo 2>&1 | head -30
==10495== Invalid read of size 4
==10495== at 0x10000790: do_test (foo.c:7)
==10495== by 0x10000587: main (foo.c:13)
==10495== Address 0x4310044 is 2 bytes after a block of size 2 alloc'd
==10495== at 0x4093F6C: malloc (vg_replace_malloc.c:299)
==10495== by 0x4196F63: strdup (in /usr/lib64/libc-2.27.so)
==10495== by 0x10000563: main (foo.c:13)
==10495==
==10495== Invalid read of size 4
==10495== at 0x10000794: do_test (foo.c:7)
==10495== by 0x10000587: main (foo.c:13)
==10495== Address 0x4310094 is 2 bytes after a block of size 2 alloc'd
==10495== at 0x4093F6C: malloc (vg_replace_malloc.c:299)
==10495== by 0x4196F63: strdup (in /usr/lib64/libc-2.27.so)
==10495== by 0x10000577: main (foo.c:13)
==10495==
==10495== Conditional jump or move depends on uninitialised value(s)
==10495== at 0x1000079C: do_test (foo.c:7)
==10495== by 0x10000587: main (foo.c:13)
==10495==
==10495== Conditional jump or move depends on uninitialised value(s)
==10495== at 0x4156044: vfprintf@@GLIBC_2.17 (in /usr/lib64/libc-2.27.so)
==10495== by 0x415DED7: printf@@GLIBC_2.17 (in /usr/lib64/libc-2.27.so)
==10495== by 0x100007D7: do_test (foo.c:7)
==10495== by 0x10000587: main (foo.c:13)
==10495==
==10495== Use of uninitialised value of size 8
==10495== at 0x41522E8: _itoa_word (in /usr/lib64/libc-2.27.so)
==10495== by 0x41568B7: vfprintf@@GLIBC_2.17 (in /usr/lib64/libc-2.27.so)
==10495== by 0x100007D7: do_test (foo.c:7)

objdump -d of do_test ():

0000000010000730 <do_test>:
10000730: 02 10 40 3c lis r2,4098
10000734: 00 7f 42 38 addi r2,r2,32512
10000738: a6 02 08 7c mflr r0
1000073c: 20 05 69 78 clrldi r9,r3,52
10000740: c0 0f a9 2f cmpdi cr7,r9,4032
10000744: 10 00 01 f8 std r0,16(r1)
10000748: a1 ff 21 f8 stdu r1,-96(r1)
1000074c: 10 00 9c 40 bge cr7,1000075c <do_test+0x2c>
10000750: 20 05 89 78 clrldi r9,r4,52
10000754: c0 0f a9 2f cmpdi cr7,r9,4032
10000758: 38 00 9c 41 blt cr7,10000790 <do_test+0x60>
1000075c: a5 fd ff 4b bl 10000500
<00000039.plt_call.strcmp@@GLIBC_2.17>
10000760: 18 00 41 e8 ld r2,24(r1)
10000764: b4 07 64 7c extsw r4,r3
10000768: fe ff 62 3c addis r3,r2,-2
1000076c: 98 8b 63 38 addi r3,r3,-29800
10000770: 51 fd ff 4b bl 100004c0
<00000039.plt_call.printf@@GLIBC_2.17>
10000774: 18 00 41 e8 ld r2,24(r1)
10000778: 60 00 21 38 addi r1,r1,96
1000077c: 10 00 01 e8 ld r0,16(r1)
10000780: a6 03 08 7c mtlr r0
10000784: 20 00 80 4e blr
10000788: 00 00 00 60 nop
1000078c: 00 00 42 60 ori r2,r2,0
10000790: 28 1c 40 7d ldbrx r10,0,r3
10000794: 28 24 00 7d ldbrx r8,0,r4
10000798: 51 50 28 7d subf. r9,r8,r10
1000079c: 54 00 82 41 beq 100007f0 <do_test+0xc0>
100007a0: 00 00 20 39 li r9,0
100007a4: f8 43 43 7d cmpb r3,r10,r8
100007a8: f8 4b 49 7d cmpb r9,r10,r9
100007ac: 38 1b 23 7d orc r3,r9,r3
100007b0: 74 00 63 7c cntlzd r3,r3
100007b4: 08 00 63 38 addi r3,r3,8
100007b8: 30 1e 4a 79 rldcl r10,r10,r3,56
100007bc: 30 1e 03 79 rldcl r3,r8,r3,56
100007c0: 50 50 23 7d subf r9,r3,r10
100007c4: 78 4b 23 7d mr r3,r9
100007c8: b4 07 64 7c extsw r4,r3
100007cc: fe ff 62 3c addis r3,r2,-2
100007d0: 98 8b 63 38 addi r3,r3,-29800
100007d4: ed fc ff 4b bl 100004c0
<00000039.plt_call.printf@@GLIBC_2.17>
100007d8: 18 00 41 e8 ld r2,24(r1)
100007dc: 60 00 21 38 addi r1,r1,96
100007e0: 10 00 01 e8 ld r0,16(r1)
100007e4: a6 03 08 7c mtlr r0
100007e8: 20 00 80 4e blr
100007ec: 00 00 42 60 ori r2,r2,0
100007f0: f8 4b 4a 7d cmpb r10,r10,r9
100007f4: 00 00 aa 2f cmpdi cr7,r10,0
100007f8: cc ff 9e 40 bne cr7,100007c4 <do_test+0x94>
100007fc: 08 00 23 39 addi r9,r3,8
10000800: 28 4c 40 7d ldbrx r10,0,r9
10000804: 08 00 24 39 addi r9,r4,8
10000808: 28 4c 00 7d ldbrx r8,0,r9
1000080c: 51 50 28 7d subf. r9,r8,r10
10000810: 90 ff 82 40 bne 100007a0 <do_test+0x70>
10000814: f8 4b 4a 7d cmpb r10,r10,r9
10000818: 00 00 aa 2f cmpdi cr7,r10,0
1000081c: a8 ff 9e 40 bne cr7,100007c4 <do_test+0x94>
10000820: 10 00 23 39 addi r9,r3,16
10000824: 28 4c 40 7d ldbrx r10,0,r9
10000828: 10 00 24 39 addi r9,r4,16
1000082c: 28 4c 00 7d ldbrx r8,0,r9
10000830: 51 50 28 7d subf. r9,r8,r10
10000834: 6c ff 82 40 bne 100007a0 <do_test+0x70>
10000838: f8 4b 4a 7d cmpb r10,r10,r9
1000083c: 00 00 aa 2f cmpdi cr7,r10,0
10000840: 84 ff 9e 40 bne cr7,100007c4 <do_test+0x94>
10000844: 18 00 23 39 addi r9,r3,24
10000848: 28 4c 40 7d ldbrx r10,0,r9
1000084c: 18 00 24 39 addi r9,r4,24
10000850: 28 4c 00 7d ldbrx r8,0,r9
10000854: 51 50 28 7d subf. r9,r8,r10
10000858: 48 ff 82 40 bne 100007a0 <do_test+0x70>
1000085c: f8 4b 4a 7d cmpb r10,r10,r9
10000860: 00 00 aa 2f cmpdi cr7,r10,0
10000864: 60 ff 9e 40 bne cr7,100007c4 <do_test+0x94>
10000868: 20 00 23 39 addi r9,r3,32
1000086c: 28 4c 40 7d ldbrx r10,0,r9
10000870: 20 00 24 39 addi r9,r4,32
10000874: 28 4c 00 7d ldbrx r8,0,r9
10000878: 51 50 28 7d subf. r9,r8,r10
1000087c: 24 ff 82 40 bne 100007a0 <do_test+0x70>
10000880: f8 4b 4a 7d cmpb r10,r10,r9
10000884: 00 00 aa 2f cmpdi cr7,r10,0
10000888: 3c ff 9e 40 bne cr7,100007c4 <do_test+0x94>
1000088c: 28 00 23 39 addi r9,r3,40
10000890: 28 4c 40 7d ldbrx r10,0,r9
10000894: 28 00 24 39 addi r9,r4,40
10000898: 28 4c 00 7d ldbrx r8,0,r9
1000089c: 51 50 28 7d subf. r9,r8,r10
100008a0: 00 ff 82 40 bne 100007a0 <do_test+0x70>
100008a4: f8 4b 4a 7d cmpb r10,r10,r9
100008a8: 00 00 aa 2f cmpdi cr7,r10,0
100008ac: 18 ff 9e 40 bne cr7,100007c4 <do_test+0x94>
100008b0: 30 00 23 39 addi r9,r3,48
100008b4: 28 4c 40 7d ldbrx r10,0,r9
100008b8: 30 00 24 39 addi r9,r4,48
100008bc: 28 4c 00 7d ldbrx r8,0,r9
100008c0: 51 50 28 7d subf. r9,r8,r10
100008c4: dc fe 82 40 bne 100007a0 <do_test+0x70>
100008c8: f8 4b 4a 7d cmpb r10,r10,r9
100008cc: 00 00 aa 2f cmpdi cr7,r10,0
100008d0: f4 fe 9e 40 bne cr7,100007c4 <do_test+0x94>
100008d4: 38 00 23 39 addi r9,r3,56
100008d8: 28 4c 40 7d ldbrx r10,0,r9
100008dc: 38 00 24 39 addi r9,r4,56
100008e0: 28 4c 00 7d ldbrx r8,0,r9
100008e4: 51 50 28 7d subf. r9,r8,r10
100008e8: b8 fe 82 40 bne 100007a0 <do_test+0x70>
100008ec: f8 4b 4a 7d cmpb r10,r10,r9
100008f0: 00 00 aa 2f cmpdi cr7,r10,0
100008f4: d0 fe 9e 40 bne cr7,100007c4 <do_test+0x94>
100008f8: 40 00 84 38 addi r4,r4,64
100008fc: 40 00 63 38 addi r3,r3,64
10000900: 5c fe ff 4b b 1000075c <do_test+0x2c>
10000904: 00 00 00 00 .long 0x0
10000908: 00 00 00 01 .long 0x1000000
1000090c: 80 00 00 00 .long 0x80
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-09 15:33:23 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #25 from Aaron Sawdey <***@linux.ibm.com> ---
Created attachment 116201
--> https://bugs.kde.org/attachment.cgi?id=116201&action=edit
strcmp test case, ppc64le gpr version
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-09 15:34:08 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #26 from Aaron Sawdey <***@linux.ibm.com> ---
Created attachment 116202
--> https://bugs.kde.org/attachment.cgi?id=116202&action=edit
strcmp test case, ppc64le p8 vsx version
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-09 15:34:38 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #27 from Aaron Sawdey <***@linux.ibm.com> ---
Created attachment 116203
--> https://bugs.kde.org/attachment.cgi?id=116203&action=edit
strcmp test case, ppc64le p9 vsx version
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-09 15:37:26 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #28 from Aaron Sawdey <***@linux.ibm.com> ---
I've just attached asm generated by current gcc trunk for this test case:

#include <string.h>

int main ()
{
char str1[15];
char str2[15];
strcpy(str1, "abcdef");
strcpy(str2, "ABCDEF");
return strcmp(str1, str2);
}

They are compiled with -mno-vsx (gpr version), -mcpu=power8 (vsx p8), and
-mcpu=power9 (vsx p9).
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-11-15 17:48:20 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Julian Seward <***@acm.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #111056|0 |1
is obsolete| |

--- Comment #29 from Julian Seward <***@acm.org> ---
Created attachment 116331
--> https://bugs.kde.org/attachment.cgi?id=116331&action=edit
WIP patch, for testing

WIP patch. Tested on 64-bit P8 LE. This runs clean with
the two P8 test cases in comments 25 and 26:

* strcmp test case, ppc64le gpr version
* strcmp test case, ppc64le p8 vsx version

I didn't test the P9 VSX version yet. I think the patch is more accurate
and also reasonably correct, but may assert in a couple of cases due to
missing code generation cases, which I will fix. It's worth trying as-is
on P8-LE though. It applies on top of 1d42a625abd8b6d24186e6bda2b401ee24a118c4
(Philippe Waroquiers, Tue Nov 6 21:40:43 2018 +0100)
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-11-16 09:16:52 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Julian Seward <***@acm.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #116331|0 |1
is obsolete| |

--- Comment #30 from Julian Seward <***@acm.org> ---
Created attachment 116337
--> https://bugs.kde.org/attachment.cgi?id=116337&action=edit
Patch for wider testing

Here's a patch for wider testing. If it looks OK after testing, I'll
divide it up into its various logically-independent pieces for landing.

It looks OK to me in initial testing on:

gcc135 (P9, LE64)
gcc112 (P8, LE64)
gcc110 (P7, BE64 and BE32)

One thing I changed, but couldn't test, is the implementation of vpopcntd
on 32 bit targets. So this really needs testing on a BE32 platform that can
do vpopcntd (gcc110 can't).
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-16 18:11:45 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #31 from Aaron Sawdey <***@linux.ibm.com> ---
I tested this on a target that does support 32-bit (power8 BE) and it looked
good. Various strcmp/memcmp expansion tests do not see errors from valgrind.
There were no additional regtest fails between unpatched and patched builds.
What I don't know is if the regtest is a sufficient test of vpopcntd.
--
You are receiving this mail because:
You are watching all bug changes.
b***@kde.org
2018-11-18 18:16:44 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

***@gmx.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmx.com
--
You are receiving this mail because:
You are watching all bug changes.
Carl Love
2018-11-19 15:54:54 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #32 from Carl Love <***@us.ibm.com> ---
Aaron, it sounds like you ran make regtest with and without the patch and
didn't see any errors. The regtest includes a fairly good test of vpopcntd in
test jm_vec_isa_2_07. So as long as you didn't see any additional errors,
specifically in jm_vec_isa_2_07 you should be good to go.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-19 16:50:11 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #33 from Mark Wielaard <***@klomp.org> ---
I assume the testing was done against GCC9/svn trunk? If so, are there any
specific GCC9 patches that could/should be backported to GCC8?
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-19 17:12:17 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #34 from Aaron Sawdey <***@linux.ibm.com> ---
Carl, yes I ran regtest on power8 BE (which supports 32 and 64 bit abi) with
and without the patch and didn't see any differences.

Mark, it's on my list to back port the gcc code gen change to gcc8. The patch
involved is this one:

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01589.html
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-11-20 11:16:57 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #35 from Julian Seward <***@acm.org> ---
Fixed:

27fe22378da38424102c5292b782cacdd9d7b9e4
Add support for Iop_{Sar,Shr}8 on ppc. --expensive-definedness-checks=yes
needs them.

cb5d7e047598bff6d0f1d707a70d9fb1a1c7f0e2
fold_Expr: transform PopCount64(And64(Add64(x,-1),Not64(x))) into
CtzNat64(x).

81d9832226d6e3d1ee78ee3133189d7b520e7eea
ppc front end: use new IROps added in 42719898.

e221eca26be6b2396e3fcbf4117e630fc22e79f6
Add Memcheck support for IROps added in 42719898.

97d336b79e36f6c99d8b07f49ebc9b780e6df84e
Add ppc host-side isel and instruction support for IROps added in previous
commit.

4271989815b5fc933c1e29bc75507c2726dc3738
Add some new IROps to support improved Memcheck analysis of strlen etc.

7f1dd9d5aec1f1fd4eb0ae3a311358a914f1d73f
get_otrack_shadow_offset_wrk for ppc32 and ppc64: add missing cases for
XER_OV32, XER_CA32 and C_FPCC.
--
You are receiving this mail because:
You are watching all bug changes.
b***@kde.org
2018-11-27 10:38:36 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #36 from ***@gmx.com ---
Seems only partially fixed? On power8, ppc64le, gcc 8.2, and current valgrind
git, the following still fails:

#include <stdlib.h>
#include <string.h>

int main() {

char *foo = calloc(3, 1);

return strcmp(foo, "a");
}
==32113== Memcheck, a memory error detector
==32113== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32113== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright
info
==32113== Command: ../valgrind-strcmp
==32113==
==32113== Invalid read of size 4
==32113== at 0x1806F8: main (valgrind-strcmp.c:8)
==32113== Address 0x4b30044 is 1 bytes after a block of size 3 alloc'd
==32113== at 0x48973E8: calloc (vg_replace_malloc.c:752)
==32113== by 0x1806DF: main (valgrind-strcmp.c:6)
==32113==
==32113== Conditional jump or move depends on uninitialised value(s)
==32113== at 0x180700: main (valgrind-strcmp.c:8)
==32113==
==32113==
==32113== HEAP SUMMARY:
==32113== in use at exit: 3 bytes in 1 blocks
==32113== total heap usage: 1 allocs, 0 frees, 3 bytes allocated
==32113==
==32113== LEAK SUMMARY:
==32113== definitely lost: 3 bytes in 1 blocks
==32113== indirectly lost: 0 bytes in 0 blocks
==32113== possibly lost: 0 bytes in 0 blocks
==32113== still reachable: 0 bytes in 0 blocks
==32113== suppressed: 0 bytes in 0 blocks
==32113== Rerun with --leak-check=full to see details of leaked memory
==32113==
==32113== For counts of detected and suppressed errors, rerun with: -v
==32113== Use --track-origins=yes to see where uninitialised values come from
==32113== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-27 12:24:59 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #37 from Aaron Sawdey <***@linux.ibm.com> ---
The fix is in gcc 9 trunk. I'm working on the gcc 8 back port now so it's
definitely not in 8.2, hopefully will be in 8.3 when that comes out.
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-28 19:37:17 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #38 from Aaron Sawdey <***@linux.ibm.com> ---
The fix for this is checked in to gcc-8-branch svn revision 266578. I believe
this will show up in gcc 8.3.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-29 22:36:18 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #39 from Mark Wielaard <***@klomp.org> ---
With that gcc backport https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02161.html
and the valgrind fixes we get rid of all the Conditional jump or move depends
on uninitialised value(s) issues, but unfortunately we still have an issue with
the some Invalid read of size 4.

The simplest example is the last C program:

#include <stdlib.h>
#include <string.h>

int main() {

char *foo = calloc(3, 1);

return strcmp(foo, "a");
}

gcc -g -O2 -o t t.c

valgrind -q ./t
==31430== Invalid read of size 4
==31430== at 0x10000510: main (t.c:8)
==31430== Address 0x42e0044 is 1 bytes after a block of size 3 alloc'd
==31430== at 0x40874C8: calloc (vg_replace_malloc.c:752)
==31430== by 0x100004FF: main (t.c:6)
==31430==

The issue is the following ldbrx:

Dump of assembler code for function main:
0x00000000100004e0 <+0>: lis r2,4098
0x00000000100004e4 <+4>: addi r2,r2,32512
0x00000000100004e8 <+8>: mflr r0
0x00000000100004ec <+12>: li r4,1
0x00000000100004f0 <+16>: li r3,3
0x00000000100004f4 <+20>: std r0,16(r1)
0x00000000100004f8 <+24>: stdu r1,-32(r1)
0x00000000100004fc <+28>: bl 0x10000480
<00000022.plt_call.calloc@@GLIBC_2.17>
0x0000000010000500 <+32>: ld r2,24(r1)
0x0000000010000504 <+36>: addis r4,r2,-2
0x0000000010000508 <+40>: li r10,0
0x000000001000050c <+44>: addi r4,r4,-30120
=> 0x0000000010000510 <+48>: ldbrx r7,0,r3
0x0000000010000514 <+52>: ldbrx r8,0,r4
0x0000000010000518 <+56>: cmpb r10,r7,r10
0x000000001000051c <+60>: cmpb r9,r7,r8
0x0000000010000520 <+64>: orc. r10,r10,r9
0x0000000010000524 <+68>: bne 0x10000548 <main+104>
0x0000000010000528 <+72>: addi r9,r3,8
0x000000001000052c <+76>: ldbrx r7,0,r9
0x0000000010000530 <+80>: addi r9,r4,8
0x0000000010000534 <+84>: ldbrx r8,0,r9
0x0000000010000538 <+88>: cmpb r10,r7,r10
0x000000001000053c <+92>: cmpb r9,r7,r8
0x0000000010000540 <+96>: orc. r10,r10,r9
0x0000000010000544 <+100>: beq 0x10000570 <main+144>
0x0000000010000548 <+104>: cntlzd r9,r10
0x000000001000054c <+108>: addi r9,r9,8
0x0000000010000550 <+112>: rldcl r3,r7,r9,56
0x0000000010000554 <+116>: rldcl r9,r8,r9,56
0x0000000010000558 <+120>: subf r3,r9,r3
0x000000001000055c <+124>: addi r1,r1,32
0x0000000010000560 <+128>: extsw r3,r3
0x0000000010000564 <+132>: ld r0,16(r1)
0x0000000010000568 <+136>: mtlr r0
0x000000001000056c <+140>: blr
0x0000000010000570 <+144>: addi r9,r3,16
0x0000000010000574 <+148>: ldbrx r7,0,r9
0x0000000010000578 <+152>: addi r9,r4,16
0x000000001000057c <+156>: ldbrx r8,0,r9
0x0000000010000580 <+160>: cmpb r10,r7,r10
0x0000000010000584 <+164>: cmpb r9,r7,r8
0x0000000010000588 <+168>: orc. r10,r10,r9
0x000000001000058c <+172>: bne 0x10000548 <main+104>
0x0000000010000590 <+176>: addi r9,r3,24
0x0000000010000594 <+180>: ldbrx r7,0,r9
0x0000000010000598 <+184>: addi r9,r4,24
0x000000001000059c <+188>: ldbrx r8,0,r9
0x00000000100005a0 <+192>: cmpb r10,r7,r10
0x00000000100005a4 <+196>: cmpb r9,r7,r8
0x00000000100005a8 <+200>: orc. r10,r10,r9
0x00000000100005ac <+204>: bne 0x10000548 <main+104>
0x00000000100005b0 <+208>: addi r9,r3,32
0x00000000100005b4 <+212>: ldbrx r7,0,r9
0x00000000100005b8 <+216>: addi r9,r4,32
0x00000000100005bc <+220>: ldbrx r8,0,r9
0x00000000100005c0 <+224>: cmpb r10,r7,r10
0x00000000100005c4 <+228>: cmpb r9,r7,r8
0x00000000100005c8 <+232>: orc. r10,r10,r9
0x00000000100005cc <+236>: bne 0x10000548 <main+104>
0x00000000100005d0 <+240>: addi r9,r3,40
0x00000000100005d4 <+244>: ldbrx r7,0,r9
0x00000000100005d8 <+248>: addi r9,r4,40
0x00000000100005dc <+252>: ldbrx r8,0,r9
0x00000000100005e0 <+256>: cmpb r10,r7,r10
0x00000000100005e4 <+260>: cmpb r9,r7,r8
0x00000000100005e8 <+264>: orc. r10,r10,r9
0x00000000100005ec <+268>: bne 0x10000548 <main+104>
0x00000000100005f0 <+272>: addi r9,r3,48
0x00000000100005f4 <+276>: ldbrx r7,0,r9
0x00000000100005f8 <+280>: addi r9,r4,48
0x00000000100005fc <+284>: ldbrx r8,0,r9
0x0000000010000600 <+288>: cmpb r10,r7,r10
0x0000000010000604 <+292>: cmpb r9,r7,r8
0x0000000010000608 <+296>: orc. r10,r10,r9
0x000000001000060c <+300>: bne 0x10000548 <main+104>
0x0000000010000610 <+304>: addi r9,r3,56
0x0000000010000614 <+308>: ldbrx r7,0,r9
0x0000000010000618 <+312>: addi r9,r4,56
0x000000001000061c <+316>: ldbrx r8,0,r9
0x0000000010000620 <+320>: cmpb r10,r7,r10
0x0000000010000624 <+324>: cmpb r9,r7,r8
0x0000000010000628 <+328>: orc. r10,r10,r9
0x000000001000062c <+332>: bne 0x10000548 <main+104>
0x0000000010000630 <+336>: addi r4,r4,64
0x0000000010000634 <+340>: addi r3,r3,64
0x0000000010000638 <+344>: bl 0x100004c0
<00000022.plt_call.strcmp@@GLIBC_2.17>
0x000000001000063c <+348>: ld r2,24(r1)
0x0000000010000640 <+352>: b 0x1000055c <main+124>
0x0000000010000644 <+356>: .long 0x0
0x0000000010000648 <+360>: .long 0x1000000
0x000000001000064c <+364>: .long 0x80
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-29 22:41:51 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #40 from Mark Wielaard <***@klomp.org> ---
Created attachment 116575
--> https://bugs.kde.org/attachment.cgi?id=116575&action=edit
t binary testcase

The t binary from comment #39 to help with debugging.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-29 23:55:06 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #41 from Mark Wielaard <***@klomp.org> ---
Hoping that the JRS FIXME for ldbrx (Load Doubleword Byte-Reverse Indexed) was
a hint that memcheck would be helped by a single double-word load plus
Iop_Reverse I hacked up:

DIP("ldbrx r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr);
IRTemp dw1 = newTemp(Ity_I64);
IRTemp dw2 = newTemp(Ity_I64);

assign( dw1, load( Ity_I64, mkexpr(EA)) );
assign( dw2, unop(Iop_Reverse8sIn64_x1, mkexpr(dw1)) );
putIReg( rD_addr, mkexpr(dw2) );

But the ppc backend doesn't handle Iop_Reverse8sIn64_x1, so that didn't really
go anywhere for now.
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-11-30 03:02:13 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #42 from Aaron Sawdey <***@linux.ibm.com> ---
I think you're on the right track with ldbrx. The only difference between the
code that gcc is generating and what is in glibc is ldbrx vs ld, here is what
is in glibc trunk:

/* For short string up to 16 bytes, load both s1 and s2 using
unaligned dwords and compare. */
ld r8,0(r3)
ld r10,0(r4)
cmpb r12,r8,r0
cmpb r11,r8,r10
orc. r9,r12,r11
bne cr0,L(different_nocmpb)

The glibc code has a couple more instructions afterwards because we don't have
a count trailing zeros instruction that would make the most sense for the byte
ordering that ld gives you on little endian.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-30 14:21:20 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #43 from Mark Wielaard <***@klomp.org> ---
Created attachment 116586
--> https://bugs.kde.org/attachment.cgi?id=116586&action=edit
PPC Iop_Reverse8sIn64_x1 implementation

This compiles and runs, but hasn't been tested at all (combine with the patch
in comment #41)
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-30 21:21:50 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Mark Wielaard <***@klomp.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #116586|0 |1
is obsolete| |

--- Comment #44 from Mark Wielaard <***@klomp.org> ---
Created attachment 116593
--> https://bugs.kde.org/attachment.cgi?id=116593&action=edit
implement ldbrx as 64-bit load and Iop_Reverse8sIn64_x1

Real implementation of ldbrx as 64-bit load and Iop_Reverse8sIn64_x1.

This works well and resolves the issue with (aligned) ldbrx instructions
reading past a memory block.

It has only be tested on a ppc64le power9 setup for now.
Could someone who actually knows ppc64 review this implementation and say if it
is good enough for other setups too?
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-11-30 21:45:07 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #45 from Mark Wielaard <***@klomp.org> ---
Unfortunately even with the ldbrx patch (which seems an OK improvement in
itself) we still have some issues.

ldbrx is also used on unaligned addresses. In which case even
--partial-loads-ok=yes doesn't help us.

For example this little program which has a string table with some strings at
odd addresses:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int main (int argc, char **argv)
{
char *strtable = malloc (5);
// First element is the empty string.
strtable[0] = '\0';
// Second (and last) element is a short string.
strcpy (&strtable[1], "abc");

// What was the second string again?
int equal = strcmp (argv[0], &strtable[1]);
if (equal == 0)
puts ("yes");
else
puts ("no");

free (strtable);
return 0;
}

gcc -Wall -g -O2 -o table table.c
valgrind -q ./table
==7902== Invalid read of size 8
==7902== at 0x10000630: main (table.c:14)
==7902== Address 0x42e0041 is 1 bytes inside a block of size 5 alloc'd
==7902== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7902== by 0x100005A7: main (table.c:7)
==7902==
no

Dump of assembler code for function main:
0x0000000010000580 <+0>: lis r2,4098
0x0000000010000584 <+4>: addi r2,r2,32512
0x0000000010000588 <+8>: mflr r0
0x000000001000058c <+12>: std r30,-16(r1)
0x0000000010000590 <+16>: std r31,-8(r1)
0x0000000010000594 <+20>: li r3,5
0x0000000010000598 <+24>: mr r30,r4
0x000000001000059c <+28>: std r0,16(r1)
0x00000000100005a0 <+32>: stdu r1,-48(r1)
0x00000000100005a4 <+36>: bl 0x10000560
<00000022.plt_call.malloc@@GLIBC_2.17>
0x00000000100005a8 <+40>: ld r2,24(r1)
0x00000000100005ac <+44>: lis r10,99
0x00000000100005b0 <+48>: li r8,0
0x00000000100005b4 <+52>: ori r10,r10,25185
0x00000000100005b8 <+56>: mr r31,r3
0x00000000100005bc <+60>: ld r3,0(r30)
0x00000000100005c0 <+64>: addi r4,r31,1
0x00000000100005c4 <+68>: stb r8,0(r31)
0x00000000100005c8 <+72>: stw r10,1(r31)
0x00000000100005cc <+76>: clrldi r9,r3,52
0x00000000100005d0 <+80>: cmpdi cr7,r9,4032
0x00000000100005d4 <+84>: bge cr7,0x100005e4 <main+100>
0x00000000100005d8 <+88>: clrldi r9,r4,52
0x00000000100005dc <+92>: cmpdi cr7,r9,4032
0x00000000100005e0 <+96>: blt cr7,0x1000062c <main+172>
0x00000000100005e4 <+100>: bl 0x10000520
<00000022.plt_call.strcmp@@GLIBC_2.17>
0x00000000100005e8 <+104>: ld r2,24(r1)
0x00000000100005ec <+108>: cmpwi cr7,r3,0
0x00000000100005f0 <+112>: bne cr7,0x1000074c <main+460>
0x00000000100005f4 <+116>: addis r3,r2,-2
0x00000000100005f8 <+120>: addi r3,r3,-29840
0x00000000100005fc <+124>: bl 0x10000540
<00000022.plt_call.puts@@GLIBC_2.17>
0x0000000010000600 <+128>: ld r2,24(r1)
0x0000000010000604 <+132>: mr r3,r31
0x0000000010000608 <+136>: bl 0x100004e0
<00000022.plt_call.free@@GLIBC_2.17>
0x000000001000060c <+140>: ld r2,24(r1)
0x0000000010000610 <+144>: addi r1,r1,48
0x0000000010000614 <+148>: li r3,0
0x0000000010000618 <+152>: ld r0,16(r1)
0x000000001000061c <+156>: ld r30,-16(r1)
0x0000000010000620 <+160>: ld r31,-8(r1)
0x0000000010000624 <+164>: mtlr r0
0x0000000010000628 <+168>: blr
0x000000001000062c <+172>: ldbrx r8,0,r3
=> 0x0000000010000630 <+176>: ldbrx r9,0,r4
0x0000000010000634 <+180>: li r10,0
0x0000000010000638 <+184>: cmpb r7,r8,r9
0x000000001000063c <+188>: cmpb r10,r8,r10
0x0000000010000640 <+192>: orc. r10,r10,r7
0x0000000010000644 <+196>: bne 0x10000734 <main+436>
0x0000000010000648 <+200>: addi r9,r3,8
0x000000001000064c <+204>: ldbrx r8,0,r9
0x0000000010000650 <+208>: addi r9,r4,8
0x0000000010000654 <+212>: ldbrx r9,0,r9
0x0000000010000658 <+216>: cmpb r10,r8,r10
0x000000001000065c <+220>: cmpb r7,r8,r9
0x0000000010000660 <+224>: orc. r10,r10,r7
0x0000000010000664 <+228>: bne 0x10000734 <main+436>

This means we have to handle unaligned partial reads somehow.
I am not sure we can. The code in memcheck/mc_main.c (mc_LOADVn_slow) dealing
with partial reads seems to very much rely on at least word alignment.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-12-02 16:04:34 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #46 from Julian Seward <***@acm.org> ---
(In reply to Mark Wielaard from comment #44)
Created attachment 116593 [details]
implement ldbrx as 64-bit load and Iop_Reverse8sIn64_x1
Real implementation of ldbrx as 64-bit load and Iop_Reverse8sIn64_x1.
Regardless of the fact that there are still misaligned ldbrxs to deal with,
I think this is good and should land. I would ask:

* for "case 0x214: // ldbrx (Load Doubleword Byte-Reverse Indexed)"
please add a comment to say that dis_int_ldst_rev's caller guarantees
to ask for this instruction to be decoded only in 64-bit mode

* for the backend expression, it's fine, except that for
// r = (r & 0x00000000FFFFFFFF) << 32 | (r & 0xFFFFFFFF00000000) >> 32;
there's no need to do the masking since the shifts will remove all of
the bits that just got masked out anyway. So we can skip the _LI, the NOT
and the two ANDs.

* oh .. and I think the whole thing (insn selection for Iop_Reverse8sIn64_x1)
needs to be guarded by the is-this-64-bit-mode boolean (env->mode64 or
something like that) since we don't want to be emitting this bit of code in
32 bit mode.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-12-02 16:13:24 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #47 from Julian Seward <***@acm.org> ---
(In reply to Mark Wielaard from comment #45)
Post by Mark Wielaard
Unfortunately even with the ldbrx patch (which seems an OK improvement in
itself) we still have some issues.
ldbrx is also used on unaligned addresses. In which case even
--partial-loads-ok=yes doesn't help us.
[..]
This means we have to handle unaligned partial reads somehow.
I am not sure we can. The code in memcheck/mc_main.c (mc_LOADVn_slow)
dealing with partial reads seems to very much rely on at least word
alignment.
That doesn't worry me so much. I'm sure we can come up with some ppc64le-only
special-casing in mc_LOADVn_slow to deal with it.

What worries me more is the base compiler pipeline (ignoring Memcheck). We'll
have a misaligned ldbrx going into the pipeline, which will eventually get
re-emitted
as a plain, vanilla, etc, 64-bit integer load (ld). Will that work properly
for a
misaligned address? Or will it trap (sigbus?) Or will it silently zero out
the
three lowest address bits before performing the load? This is only going to
work
correctly if it correctly handles the misalignment.
--
You are receiving this mail because:
You are watching all bug changes.
Carl Love
2018-12-03 18:47:23 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #48 from Carl Love <***@us.ibm.com> ---
Mark:

I took your patch " implement ldbrx as 64-bit load and Iop_Reverse8sIn64_x1" in
comment 44, attachment 116593 and applied it to current mainline.

commit f2387ed7bed64571197b44c1f8abae4d907bce1b
Author: Mark Wielaard <***@klomp.org>
Date: Fri Nov 30 15:23:06 2018 -0500

Implement ppc64 ldbrx as 64-bit load and Iop_Reverse8sIn64_x1.

commit 206e81e8add574bbb38c11105005decedabc2f4a
Author: Mark Wielaard <***@klomp.org>
Date: Sun Dec 2 12:39:27 2018 +0100

Fix tsan_unittest.cpp compile error with older compilers.

Older compilers (g++ 4.8.5) don't like '>>':
error: ‘>>’ should be ‘> >’ within a nested template argument list.
Add an extra space.

Then compiled and installed valgrind.

I then ran your string program in comment 45 on P7, P8 BE, P8 LE, P9 as
follows:

gcc -o mark_test ../mark_test.c
valgrind ./mark_test

On Power 7, Power 8 BE, Power 8 LE, Power 9 LE running the test:

valgrind ./mark_test
==18373== Memcheck, a memory error detector
==18373== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18373== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright
info
==18373== Command: ./mark_test
==18373==
no
==18373==
==18373== HEAP SUMMARY:
==18373== in use at exit: 0 bytes in 0 blocks
==18373== total heap usage: 1 allocs, 1 frees, 5 bytes allocated
==18373==
==18373== All heap blocks were freed -- no leaks are possible
==18373==
==18373== For counts of detected and suppressed errors, rerun with: -v
==18373== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Running natively an all of these systems gives:

./mark_test
no


I read thru the patch, it looks fine. I do think there might be an
easier/faster way to implement the ldbrx instruction.

Your patch does the following:

+ assign( dw1, load(Ity_I64, mkexpr(EA)) );
+ assign( dw2, unop(Iop_Reverse8sIn64_x1, mkexpr(dw1)) );
+ putIReg( rD_addr, mkexpr(dw2) );

Note, load(Ity_I64, mkexpr(EA)) does an endian aware load:

/* This generates a normal (non load-linked) load. */
static IRExpr* load ( IRType ty, IRExpr* addr )
{
if (host_endness == VexEndnessBE)
return IRExpr_Load(Iend_BE, ty, addr);
else
return IRExpr_Load(Iend_LE, ty, addr);
}

I think if you were to call IRExpr_Load to do the load as follows:

/* Load the value with the bytes reversed by doing a BE load on an LE machine
and a LE load on a BE machine. */
if (host_endness == VexEndnessBE)
assign( dw1, IRExpr_Load(Iend_LE, Ity_I64, mkexpr(EA)));
else
assign( dw1, IRExpr_Load(Iend_BE, Ity_I64, mkexpr(EA)));

putIReg( rD_addr, mkexpr(dw1) );

You wouldn't actually need to do the byte reverse with Iop_Reverse8sIn64_x1 as
the value would be loaded with the bytes reversed already. That said, having
support for the Iop_Reverse8sIn64_x1 is nice to have around for future use.
:-)

Any way, I did work thru the logic and coding for the Iop_Reverse8sIn64x1 and
it all looks correct.

Carl Love
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-05 13:45:41 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #49 from Mark Wielaard <***@klomp.org> ---
We did discuss this a bit more on irc (#valgrind irc.freenode.net) and there
are a couple more issues that need to be resolved to make all this work
correctly.

lxvd2x and lxvb16x have a similar issue where the load is split up in
the frontend into smaller loads, which makes the partial-load-ok logic
ineffective for memcheck. These also need to be rewritten to do full loads
plus IR to manipulate the data so that memcheck can work effectively.

The IRExpr_Load expression carries an IREndness expression the Endian-ness of
the load, which should be helpful in the frontend to express some of these
instructions.

Unaligned (normal) load instructions do seem undefined. The power backend does
seem to handle unaligned vector (128bit) loads (in VEX/priv/host_ppc_isel.c
iselVecExpr_wrk), but not for non-vector loads. So given that unaligned loads
are an issue, we need some solution for those (if ldbrx are ultimately
translated into normal loads) in the ppc backend.

memcheck mc_LOADVn_slow needs to be adapted to deal with unaligned loads (on
ppc only?). And I assume mc_LOADV_128_or_256_slow would need something similar
for unaligned vector loads (assuming lxvd2x and lxvb16x are allowed to do
unaligned loads).

There was also a question from the glibc hackers whether these vector
instruction should be emitted at all given some power errata that talked about
memory/cache corruption: https://gcc.gnu.org/ml/gcc/2018-12/msg00010.html but
it looks like that is not a real concern, so we should act as if gcc will emit
these instructions for new inlined str[n]cmp code.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-05 16:42:23 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #50 from Mark Wielaard <***@klomp.org> ---
Created attachment 116688
--> https://bugs.kde.org/attachment.cgi?id=116688&action=edit
aligned and unaligned ldbrx, lxvd2x and lxvb16x testcases

These are the assembly files generated by gcc8 with the new strcmp inlined
code.

The aligned code shows a short string being compared where the instruction is
split up in multiple loads (by the frontend!), one of which memcheck will think
it completely past addressable memory.

The unaligned code shows the strcmp using one of these double word or vector
instructions on an unaligned memory address, which also confuses memcheck.

The following is what current valgrind produces on a power9 ppc64le setup:

# for a in aligned unaligned; do for i in ldbrx lxvd2x lxvb16x; do echo $a-$i;
gcc -o $a-$i $a-$i.s; valgrind -q ./$a-$i > /dev/null; done; done
aligned-ldbrx
==7890== Invalid read of size 4
==7890== at 0x1000061C: main (in /root/v-ppc64-stcmp/aligned-ldbrx)
==7890== Address 0x42e0044 is 0 bytes after a block of size 4 alloc'd
==7890== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7890== by 0x100005A7: main (in /root/v-ppc64-stcmp/aligned-ldbrx)
==7890==
aligned-lxvd2x
==7895== Invalid read of size 8
==7895== at 0x10000628: main (in /root/v-ppc64-stcmp/aligned-lxvd2x)
==7895== Address 0x42e0048 is 4 bytes after a block of size 4 alloc'd
==7895== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7895== by 0x100005A7: main (in /root/v-ppc64-stcmp/aligned-lxvd2x)
==7895==
==7895== Use of uninitialised value of size 8
==7895== at 0x4050794: _vgnU_freeres (vg_preloaded.c:68)
==7895== by 0x4114A03: __run_exit_handlers (in
/usr/lib64/power9/libc-2.28.so)
==7895== by 0x4114A97: exit (in /usr/lib64/power9/libc-2.28.so)
==7895== by 0x40F417F: (below main) (in /usr/lib64/power9/libc-2.28.so)
==7895==
==7895== Use of uninitialised value of size 8
==7895== at 0x40505C4: ??? (in
/usr/lib64/valgrind/vgpreload_core-ppc64le-linux.so)
==7895== by 0x4114A03: __run_exit_handlers (in
/usr/lib64/power9/libc-2.28.so)
==7895== by 0x4114A97: exit (in /usr/lib64/power9/libc-2.28.so)
==7895== by 0x40F417F: (below main) (in /usr/lib64/power9/libc-2.28.so)
==7895==
aligned-lxvb16x
==7900== Invalid read of size 1
==7900== at 0x10000628: main (in /root/v-ppc64-stcmp/aligned-lxvb16x)
==7900== Address 0x42e0044 is 0 bytes after a block of size 4 alloc'd
==7900== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7900== by 0x100005A7: main (in /root/v-ppc64-stcmp/aligned-lxvb16x)
==7900==
unaligned-ldbrx
==7905== Invalid read of size 4
==7905== at 0x10000630: main (in /root/v-ppc64-stcmp/unaligned-ldbrx)
==7905== Address 0x42e0045 is 0 bytes after a block of size 5 alloc'd
==7905== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7905== by 0x100005A7: main (in /root/v-ppc64-stcmp/unaligned-ldbrx)
==7905==
unaligned-lxvd2x
==7910== Invalid read of size 8
==7910== at 0x10000630: main (in /root/v-ppc64-stcmp/unaligned-lxvd2x)
==7910== Address 0x42e0041 is 1 bytes inside a block of size 5 alloc'd
==7910== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7910== by 0x100005A7: main (in /root/v-ppc64-stcmp/unaligned-lxvd2x)
==7910==
unaligned-lxvb16x
==7915== Invalid read of size 1
==7915== at 0x10000630: main (in /root/v-ppc64-stcmp/unaligned-lxvb16x)
==7915== Address 0x42e0045 is 0 bytes after a block of size 5 alloc'd
==7915== at 0x408420C: malloc (vg_replace_malloc.c:299)
==7915== by 0x100005A7: main (in /root/v-ppc64-stcmp/unaligned-lxvb16x)
==7915==
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-07 16:22:29 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #51 from Mark Wielaard <***@klomp.org> ---
(In reply to Carl Love from comment #48)
Post by Carl Love
/* Load the value with the bytes reversed by doing a BE load on an LE
machine
and a LE load on a BE machine. */
if (host_endness == VexEndnessBE)
assign( dw1, IRExpr_Load(Iend_LE, Ity_I64, mkexpr(EA)));
else
assign( dw1, IRExpr_Load(Iend_BE, Ity_I64, mkexpr(EA)));
putIReg( rD_addr, mkexpr(dw1) );
You wouldn't actually need to do the byte reverse with Iop_Reverse8sIn64_x1
as the value would be loaded with the bytes reversed already.
I tried this, but seem to hit the following in the host_ppc_isel.c backend:

static HReg iselWordExpr_R_wrk ( ISelEnv* env, const IRExpr* e,
IREndness IEndianess )

[...]
/* --------- LOAD --------- */
case Iex_Load: {
HReg r_dst;
PPCAMode* am_addr;
if (e->Iex.Load.end != IEndianess)
goto irreducible;

Either I am misunderstanding how the IEndianess is passed around, or it isn't
actually possible to have a different endian load from the arch endianness.
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-12-07 16:46:00 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #52 from Julian Seward <***@acm.org> ---
(In reply to Mark Wielaard from comment #51)
Post by Mark Wielaard
static HReg iselWordExpr_R_wrk ( ISelEnv* env, const IRExpr* e,
IREndness IEndianess )
[...]
/* --------- LOAD --------- */
case Iex_Load: {
HReg r_dst;
PPCAMode* am_addr;
if (e->Iex.Load.end != IEndianess)
goto irreducible;
Either I am misunderstanding how the IEndianess is passed around, or it
isn't actually possible to have a different endian load from the arch
endianness.
To clarify: there are two options here:

(1) in the front end, generate an IR load with the same endianness as the
host, and then byte-reverse the vector in the IR.

(2) in the front end, generate an IR load with the opposite endianness as
the host.

The effect of (1) is that the load will arrive and be handled at the
back end no problem, and then the code generated from the swap-endianness
IR will .. well .. swap the endianness of the value.

Whereas with (2), an opposite-endianness-to-the-host IR load travels
through to the back end. At that point your options are:

(2a) directly generate an opposite-endian load, or,
(2b) generate a same-endian load, followed by code to swap the vector around.

So: (2a) is the most direct route. (1) and (2b) are essentially the same,
with the difference that (1) does the rearrangement in IR whereas (2b) does
the arrangement at the POWER instruction level (in the backend).

Personally I'd vote for (1) because (a) it's easier and (b) I am not sure
that Memcheck will work correctly when presented with opposite-from-the-host
endian loads/stores.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-07 22:23:25 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

Mark Wielaard <***@klomp.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #116593|0 |1
is obsolete| |

--- Comment #53 from Mark Wielaard <***@klomp.org> ---
Created attachment 116742
--> https://bugs.kde.org/attachment.cgi?id=116742&action=edit
Option1: Implement ldbrx as 64-bit load and Iop_Reverse8sIn64_x1

(In reply to Julian Seward from comment #52)
Post by Julian Seward
Personally I'd vote for (1) because (a) it's easier and (b) I am not sure
that Memcheck will work correctly when presented with opposite-from-the-host
endian loads/stores.
Lets go with option 1. That is the code that this patch implements.

Changes from the previous version are:
1) Added a comment that there is an earlier check that ldbrx is only used in
64bit mode.
2) Added a comment explaining what we could do if we went with option 2.
3) Add a vassert(mode64) for the usage of Iop_Reverse8sIn64_x1 in
host_ppc_isel.c.
4) Simplified the last part of Iop_Reverse8sIn64_x1 by removing the unnecessary
masking as suggested by Julian.

This make the ldbrx-aligned testcase work as expected.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-07 22:37:31 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #54 from Mark Wielaard <***@klomp.org> ---
To make the ldbrx-unaligned testcase work we can use something like the
following:

diff --git a/memcheck/mc_main.c b/memcheck/mc_main.c
index 3ef7cb913..9a7b04c9c 100644
--- a/memcheck/mc_main.c
+++ b/memcheck/mc_main.c
@@ -1508,6 +1508,10 @@ ULong mc_LOADVn_slow ( Addr a, SizeT nBits, Bool
bigendian )
# if defined(VGA_mips64) && defined(VGABI_N32)
if (szB == VG_WORDSIZE * 2 && VG_IS_WORD_ALIGNED(a)
&& n_addrs_bad < VG_WORDSIZE * 2)
+#elif defined(VGA_ppc64be) || defined(VGA_ppc64le)
+ /* Unaligned loads of words on the same page are OK. */
+ if (szB == VG_WORDSIZE && ((a & 0xfff) <= 0x1000 - szB)
+ && n_addrs_bad < VG_WORDSIZE)
# else
if (szB == VG_WORDSIZE && VG_IS_WORD_ALIGNED(a)
&& n_addrs_bad < VG_WORDSIZE)

The question is whether the extra check for the word not crossing a page
boundary is valid.

Aaron, are there are alignment constraints for the usage of the ldbrx
instruction?

(this would also need new exp files for the memcheck/tests/partial_load_ok and
partial_load_dflt tests that check unaligned word loads produce an warning)
--
You are receiving this mail because:
You are watching all bug changes.
Aaron Sawdey
2018-12-08 00:03:37 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #55 from Aaron Sawdey <***@linux.ibm.com> ---
Mark,
No alignment constraints on ldbrx, not having to check the alignment is what
makes this whole thing work. If I had to do runtime alignment checks it would
be too much code to generate inline.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-08 21:40:58 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #56 from Mark Wielaard <***@klomp.org> ---
Created attachment 116790
--> https://bugs.kde.org/attachment.cgi?id=116790&action=edit
memcheck: Allow unaligned loads of words on ppc64[le]

(In reply to Aaron Sawdey from comment #55)
Post by Aaron Sawdey
No alignment constraints on ldbrx, not having to check the alignment is
what makes this whole thing work. If I had to do runtime alignment checks it
would be too much code to generate inline.
Thanks. In that case I propose this patch which allows any unaligned word sized
load on ppc64[le] and adjusts the partial-loads testcases.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-09 22:55:24 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #61 from Mark Wielaard <***@klomp.org> ---
The lxvb16x does resolve the invalid load issues, but does not completely
resolve the conditional jump depends on undefined value issues. This happens
again when using the CR6 conditional to branch:

0x000000001000062c <+172>: lxvb16x vs33,0,r3
0x0000000010000630 <+176>: lxvb16x vs45,0,r4
0x0000000010000634 <+180>: vcmpnezb. v0,v1,v13
=> 0x0000000010000638 <+184>: bne cr6,0x10000690 <main+272>
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-09 13:55:51 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #59 from Mark Wielaard <***@klomp.org> ---
The last two patches get rid of the invalid loads for the power8 vsx inlined
strcmp sequences, but still leave us with conditional jumps depending on
undefined values.

The issue is that we will hit the following:

10000624: 99 1e 20 7c lxvd2x vs33,0,r3
10000628: 99 fe 00 7c lxvd2x vs32,0,r31
1000062c: 8c 03 a0 11 vspltisw v13,0
10000630: 00 00 40 39 li r10,0
10000634: 06 00 81 11 vcmpequb v12,v1,v0
10000638: 06 68 01 10 vcmpequb v0,v1,v13
1000063c: 57 65 00 f0 xxlorc vs32,vs32,vs44
10000640: 06 6c 20 10 vcmpequb. v1,v0,v13
10000644: 78 00 98 40 bge cr6,100006bc <main+0x13c>

The bge depends on the cr6 flag which is set as a result of comparing the
result of the vcmpequb. on two possibly not completely defined vectors.

This is done in set_AV_CR6 () in guest_ppc_toIR.c:

/* Set the CR6 flags following an AltiVec compare operation.
* NOTE: This also works for VSX single-precision compares.
* */
static void set_AV_CR6 ( IRExpr* result, Bool test_all_ones )

Called as set_AV_CR6( mkexpr(vD), True ); after interpreting the vcmpequb as
assign( vD, binop(Iop_CmpEQ8x16, mkexpr(vA), mkexpr(vB)) );
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-09 11:39:54 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #57 from Mark Wielaard <***@klomp.org> ---
Created attachment 116801
--> https://bugs.kde.org/attachment.cgi?id=116801&action=edit
Implement ppc64 lxvd2x as 128-bit load with double word swap for ppc64le.

This fixes the partial load issues seen with lxvd2x by changing the separate
64bit loads with one 128bit load with the double words swapped afterwards if
necessary (on ppc64le).
--
You are receiving this mail because:
You are watching all bug changes.
Julian Seward
2018-12-10 11:12:54 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #62 from Julian Seward <***@acm.org> ---
Created attachment 116827
--> https://bugs.kde.org/attachment.cgi?id=116827&action=edit
Possible fix for the comment 61 problems

This patch:

* changes set_AV_CR6 so that it does scalar comparisons against zero,
rather than sometimes against an all-ones word. This is something
that Memcheck can instrument exactly.

* in Memcheck, requests expensive instrumentation of Iop_Cmp{EQ,NE}64
by default on ppc64le.

Note, this is all untested! In particular it would be good to verify
that the CR6 results are unchanged.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-09 22:45:40 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #60 from Mark Wielaard <***@klomp.org> ---
Created attachment 116818
--> https://bugs.kde.org/attachment.cgi?id=116818&action=edit
Implement ppc64 lxvb16x as 128-bit vector load with reversed double words

This fixes the partial load issues seen with lxvb16x by changing the small
loads with one 128bit load and using Iop_Reverse8sIn64_x1 to reverse the bytes
in the double words, on ppc64le the double words are swapped afterwards.

Tested on power9 ppc64le.

Not tested on big endian ppc64 since I couldn't find an power9 big endian ppc64
install.
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-09 13:30:47 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #58 from Mark Wielaard <***@klomp.org> ---
Created attachment 116808
--> https://bugs.kde.org/attachment.cgi?id=116808&action=edit
Allow unaligned loads of 128bit vectors on ppc64[le]

lxvd2x might generate an unaligned 128 bit vector load, allow those for partial
loads on ppc64[le].
--
You are receiving this mail because:
You are watching all bug changes.
Mark Wielaard
2018-12-10 22:26:41 UTC
Permalink
https://bugs.kde.org/show_bug.cgi?id=386945

--- Comment #63 from Mark Wielaard <***@klomp.org> ---
(In reply to Julian Seward from comment #62)
Created attachment 116827 [details]
Possible fix for the comment 61 problems
* changes set_AV_CR6 so that it does scalar comparisons against zero,
rather than sometimes against an all-ones word. This is something
that Memcheck can instrument exactly.
* in Memcheck, requests expensive instrumentation of Iop_Cmp{EQ,NE}64
by default on ppc64le.
Note, this is all untested! In particular it would be good to verify
that the CR6 results are unchanged.
That definitely fixes the issue. I tested on both power9 and power8 with a gcc
and glibc build with the new str[n]cmp inline patch plus the following fixes
from this bug:

- Implement ppc64 ldbrx as 64-bit load and Iop_Reverse8sIn64_x1.
- memcheck: Allow unaligned loads of words on ppc64[le].
- Implement ppc64 lxvd2x as 128-bit load with double word swap for ppc64le.
- memcheck: Allow unaligned loads of 128bit vectors on ppc64[le].
- Implement ppc64 lxvb16x as 128-bit vector load with reversed double words.

And it seems to resolve all gcc strcmp inline issues I was seeing before.
Both setups were ppc64le. I haven't yet tested on ppc64 or ppc32.

I do see bug #401827 and bug #401828 but they are unrelated.

Also memcheck/tests/undef_malloc_args (stderr) fails. That one does seem caused
by earlier patches from this bug.

And occasionally I see Use of uninitialised value of size 8 in _vgnU_freeres
(vg_preloaded.c:68) which is mysterious. It goes away with
--run-libc-freeres=no --run-cxx-freeres=no. It seems unrelated to this issue.
But I haven't found out yet what is causing it. It is as if sometimes we are
unable to see that the weak symbol _ZN9__gnu_cxx9__freeresEv
(__gnu_cxx::__freeres) is initialized. Very odd, since it normally works except
in some cases. But it is very likely unrelated to this bug.
--
You are receiving this mail because:
You are watching all bug changes.
Loading...