x86/emul: Fix extable registration in invoke_stub()

For exception recovery in the stubs, the registered address for fixup is the
return address of the CALL entering the stub.

In invoke_stub(), the '.Lret%=:' label is the wrong side of the 'post'
parameter. The 'post' parameter is non-empty in cases where the arithmetic
flags of the operation need recovering.

Split the line to separate 'pre' and 'post', making it more obvious that the
return address label was in the wrong position.

However, in the case that an exception did occur, we want to skip 'post' as
it's logically part of the operation which had already failed. Therefore, add
a new skip label and use that for the exception recovery path.

This is XSA-470 / CVE-2025-27465

Fixes: 79903e50dba9 ("x86emul: catch exceptions occurring in stubs")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

scripts/add_maintainers.pl: set double dashes for long options

Current script shows message:
Don't forget to add the subject and message to ...
Then perform:
git send-email -to xen-devel@lists.xenproject.org ...
which has wrong option '-to'.
It may be confused for user.

Set double dashes for longer options to avoid that.

Fixes: e1f912cbf717 ("scripts/add_maintainers.pl: New script")
Signed-off-by: Dmytro Prokopchuk <dmytro_prokopchuk1@epam.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

x86/EFI: restrict use of --dynamicbase

At least GNU ld 2.35 takes this option to (also) mean what newer
versions have controllable by --enable-reloc-section. From there being
no relocations in check.efi (as we don't pass the option there) we infer
that we need to involve mkreloc, we'd end up with two sets of
relocations, which clearly isn't going to work. Furthermore the
relocations ld emits in this case also aren't usable: For bsp_idt[] we
end up with PE_BASE_RELOC_LOW ones, which efi_arch_relocate_image()
(deliberately) doesn't know how to deal with. (Related to that is also
why we check the number of relocations produced: The linker simply
didn't get this right there, yet.)

We also can't add the option to what we use when linking check.efi: That
ld version then would produce relocations, but 4 of them (instead of the
expected two). That would make us pass --disable-reloc-section, which
however only ld 2.36 and newer understand.

For such older binutils versions we therefore need to accept the slight
inconsistency in DLL characteristics that the earlier commit meant to
eliminate.

Fixes: f2148773b8ac ("x86/EFI: sanitize DLL characteristics in binary")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/build: pass -fzero-init-padding-bits=all to gcc15

See the respective bullet point in the Caveats section of
https://gcc.gnu.org/gcc-15/changes.html.

While I'm unaware of us currently relying on the pre-gcc15 behavior,
let's still play safe and retain what unknowingly we may have been
relying upon.

According to my observations, on x86 generated code changes
- somewhere deep in modify_bars(), presumably from the struct map_data
  initializer in apply_map() (a new MOVQ),
- in vpci_process_pending(), apparently again from the struct map_data
  initializer (and again a new MOVQ),
- near the top of find_cpio_data(), presumably from the struct cpio_data
  initializer (a MOVW changing to a MOVQ).

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/cpu-policy: Simplify logic in guest_common_default_feature_adjustments()

For features which are unconditionally set in the max policies, making the
default policy to match the host can be done with a conditional clear.

This is simpler than the unconditional clear, conditional set currently
performed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen/arm: Drop frametable_virt_end

It has never been used since the introduction and is technically a dead
code violating MISRA C.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

docs: cmdline: Update serial_tx_buffer default value

After commit 4df2e99d7314 ("console/serial: set the default transmit
buffer size in Kconfig"), the default value is set by Kconfig option
CONFIG_SERIAL_TX_BUFSIZE. Moreover it was bumped to 32KB by commit
d09e44e5d8fd ("console/serial: bump buffer from 16K to 32K").

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen: fix unspecified behavior in tr invocation

The result of the command is undefined according to the specification if
the "string2" argument in tr is shorter than "string1". GNU tr behaves
correctly by extending "string2" to repeat the last character.

Fixes: eb61a4fb14d2 ("xen: fix header guard generation for asm-generic headers")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

tools/libxenguest: fix build in stubdom environment

With introduction of the new byteswap infrastructure the build of
libxenguest for stubdoms was broken. Fix that again.

Fixes: 60dcff871e34 ("xen/decompressors: Remove use of *_to_cpup() helpers")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>

xen: move __ro_after_init section symbols to xen/sections.h

Instead of declaring __ro_after_init_{start,end} in each architecture's
asm/setup.h, move these declarations to the common header xen/sections.h.

This centralizes the declarations and reduces duplication across
architectures.

No functional change intended.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

xen/sysctl: make CONFIG_LIVEPATCH depend on CONFIG_SYSCTL

LIVEPATCH mechanism relies on LIVEPATCH_SYSCTL hypercall, so CONFIG_LIVEPATCH
shall depend on CONFIG_SYSCTL

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/sysctl: make CONFIG_COVERAGE depend on CONFIG_SYSCTL

Users rely on SYSCTL_coverage_op hypercall to interact with the coverage data,
that is, according operations shall be wrapped around with CONFIG_SYSCTL.
Right now, it is compiled under CONFIG_COVERAGE, so we shall make
CONFIG_COVERAGE depend on CONFIG_SYSCTL.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/sysctl: wrap around XEN_SYSCTL_physinfo

The following functions are only used to deal with XEN_SYSCTL_physinfo,
then they shall be wrapped:
- arch_do_physinfo()
- get_outstanding_claims()

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/sysctl: wrap around XEN_SYSCTL_lockprof_op

The following function is only to serve spinlock profiling via
XEN_SYSCTL_lockprof_op, so it shall be wrapped:
- spinlock_profile_control()

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/sysctl: wrap around XEN_SYSCTL_perfc_op

perfc_control() and perfc_copy_info() are responsible for providing control
of perf counters via XEN_SYSCTL_perfc_op in DOM0, so they both shall
be wrapped.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/sysctl: make CONFIG_TRACEBUFFER depend on CONFIG_SYSCTL

Users could only access trace buffers via hypercall XEN_SYSCTL_tbuf_op,
so we shall make CONFIG_TRACEBUFFER depend on CONFIG_SYSCTL

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen: introduce CONFIG_SYSCTL

We introduce a new Kconfig CONFIG_SYSCTL, which shall only be disabled
on some dom0less systems or PV shim on x86, to reduce Xen footprint.

Making SYSCTL without prompt is transient and it will be adjusted in the final
patch. And the consequence of introducing "CONFIG_SYSCTL=y" in .config file
generated from pvshim_defconfig is transient too, which will also be adjusted
in the final patch.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

x86/boot: move l<N>_bootmap

Having them in the general .init.data section is somewhat wasteful, due
to involved padding. Move them into .init.data.page_aligned, and place
that right after .init.bss.stack_aligned.

Overall .init.data* shrinks by slightly over 2 pages in the build I'm
looking at.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/efi: Handle cases where file didn't come from ESP

A boot loader can load files from outside ESP.
In these cases device could be not provided or path could
be something not supported.
In these cases allows to boot anyway, all information
could be provided using UKI or using other boot loader
features.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

xen/char: wrap suspend/resume console callbacks with CONFIG_SYSTEM_SUSPEND

This patch wraps the suspend/resume console callbacks and related code within
CONFIG_SYSTEM_SUSPEND blocks. This ensures that these functions and their
calls are only included in the build when CONFIG_SYSTEM_SUSPEND is enabled.

This addresses Misra Rule 2.1 violations.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>

x86/pdx: simplify calculation of domain struct allocation boundary

When not using CONFIG_BIGMEM there are some restrictions in the address
width for allocations of the domain structure, as it's PDX truncated to 32
bits it's stashed into page_info structure for domain allocated pages.

The current logic to calculate this limit is based on the internals of the
PDX compression used, which is not strictly required. Instead simplify the
logic to rely on the existing PDX to PFN conversion helpers used elsewhere.

This has the added benefit of allowing alternative PDX compression
algorithms to be implemented without requiring to change the calculation of
the domain structure allocation boundary.

As a side effect introduce pdx_to_paddr() conversion macro and use it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/boot: Improve paging mode diagnostics in create_dom0()

I was presented with this:

  (XEN) NX (Execute Disable) protection active
  (XEN) d0 has maximum 416 PIRQs
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Error creating d0: -95
  (XEN) ****************************************

which is less than helpful.  It turns out to be the -EOPNOTSUPP from
shadow_domain_init().

The real bug here is create_dom0() unconditionally assuming the presence of
SHADOW_PAGING.  Rework it to panic() rather than choosing a dom0_cfg which is
guaranteed to fail.  This results in:

  (XEN) NX (Execute Disable) protection active
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Neither HAP nor Shadow available for PVH domain
  (XEN) ****************************************

which is rather more helpful.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/idle: Misc cleanup

Sort includes, and drop trailing whitespace.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Revert part of "x86/mwait-idle: disable IBRS during long idle"

Most of the patch (handling of CPUIDLE_FLAG_IBRS) is fine, but the
adjustements to mwait_idle() are not; spec_ctrl_enter_idle() does more than
just alter MSR_SPEC_CTRL.IBRS.

The only reason this doesn't need an XSA is because the unconditional
spec_ctrl_{enter,exit}_idle() in mwait_idle_with_hints() were left unaltered,
and thus the MWAIT remained properly protected.

There (would have been) two problems.  In the ibrs_disable (== deep C) case:

* On entry, VERW and RSB-stuffing are architecturally skipped.
* On exit, there's a branch crossing the WRMSR which reinstates the
   speculative safety for indirect branches.

All this change did was double up the expensive operations in the deep C case,
and fail to optimise the intended case.

I have an idea of how to plumb this more nicely, but it requires larger
changes to legacy IBRS handling to not make spec_ctrl_enter_idle() vulnerable
in other ways.  In the short term, simply take out the perf hit.

Fixes: 08acdf9a2615 ("x86/mwait-idle: disable IBRS during long idle")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/idle: Remove MFENCEs for CLFLUSH_MONITOR

Commit 48d32458bcd4 ("x86, idle: add barriers to CLFLUSH workaround") was
inherited from Linux and added MFENCEs around the AAI65 errata fix.

The SDM now states:

  Executions of the CLFLUSH instruction are ordered with respect to each
  other and with respect to writes, locked read-modify-write instructions,
  and fence instructions[1].

with footnote 1 reading:

  Earlier versions of this manual specified that executions of the CLFLUSH
  instruction were ordered only by the MFENCE instruction.  All processors
  implementing the CLFLUSH instruction also order it relative to the other
  operations enumerated above.

I.e. the MFENCEs came about because of an incorrect statement in the SDM.

The Spec Update (no longer available on Intel's website) simply says "issue a
CLFLUSH", with no mention of MFENCEs.

As this erratum is specific to Intel, it's fine to remove the the MFENCEs; AMD
CPUs of a similar vintage do sport otherwise-unordered CLFLUSHs.

Move the feature bit into the BUG range (rather than FEATURE), and move the
workaround into monitor() itself.

The erratum check itself must use setup_force_cpu_cap().  It needs activating
if any CPU needs it, not if all of them need it.

Fixes: 48d32458bcd4 ("x86, idle: add barriers to CLFLUSH workaround")
Fixes: 96d1b237ae9b ("x86/Intel: work around Xeon 7400 series erratum AAI65")
Link: https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/idle: Move monitor()/mwait() wrappers into cpu-idle.c

They're not used by any other translation unit, so shouldn't live in
asm/processor.h, which is included almost everywhere.

Our new toolchain baseline knows the MONITOR/MWAIT instructions, so use them
directly rather than using raw hex.

Change the hint/extention parameters from long to int. They're specified to
remain 32bit operands even 64-bit mode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/svm: Revert 1->true conversion in svm_asid_handle_vmrun()

This is literally ASID 1 (of 2^16), not a boolean.

Fixes: 2f09f797ba43 ("x86/svm: Drop the suffix _guest from vmcb bit")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen/efi: Drop stale #undef buffer

The "buffer" macro was removed when keyhandler_scratch was removed.

Fixes: 59e087bf6a9c ("xen/keyhandler: Drop keyhandler_scratch")
Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

xen/common: Guard freeze/thaw_domains functions with CONFIG_SYSTEM_SUSPEND

This patch adds CONFIG_SYSTEM_SUSPEND guards around freeze_domains
and thaw_domains functions.

This ensures they are only compiled into the hypervisor when the system
suspend functionality is enabled, aligning their inclusion with their
specific use case.

This addresses two Misra Rule 2.1 violations.

Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen/efi: Show error message for EFI_INVALID_PARAMETER error

Show string message instead of code.
This happened trying some different ways to boot Xen, specifically
trying loading xen.efi using GRUB2 "linux" command.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

xen: Rename bootmodule{,s} to boot_module{,s}

... in alignment with the new coding style on word splitting for type
names.

This aligns its name with the largely duplicate boot_module struct
in x86. While there's no equivalent to "struct bootmodules" in x86,
changing one and not the other is just confusing. Same with various
comments and function names.

Rather than making a long subfield name even longer, remove the
_bootmodule suffix in the kernel, initrd and dtb subfields.

Not a functional change.

Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-By: Daniel P. Smith <dpsmith@apertussolutions.com>

CODING_STYLE: Custom type names must be snake-cased by word

There's the unwritten convention of splitting type names using
underscores. Add such convention to the CODINNG_STYLE to make it
common and less unwritten.

Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

docs: Introduce a default .readthedocs.yaml

Read The Docs now requires a configuration file, which is awkward when using
RTD to render proposed changes on the list.

Provide the minimal configuration file possible, sacrificing all
reproducibility in order to hopefully not need to touch it moving forwards.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/domain: fix memory leak in domain_create()

Fix potential memory leak in domain_create() in late hardware domain case.

Fixes: b959f3b820f5 ("xen: introduce hardware domain create flag")
Signed-off-by: Denis Mukhin <dmukhin@ford.com>

xen/efi: Do not check kernel signature if it was embedded

Using UKI it's possible to embed Linux kernel into xen.efi file.
In this case the signature for Secure Boot is applied to the
whole xen.efi, including the kernel.
So checking for specific signature for the kernel is not
needed.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

x86/pmstat: restore changes lost by "consolidation"

Both c6e0a5539623 ("cpufreq: use existing local var in
cpufreq_statistic_init()") and a1ce987411f6 ("cpufreq: don't leave stale
statistics pointer") were lost in the course of "moving" the code,
presumably due to overly lax re-basing.

Fixes: bf0cd071db2a ("xen/pmstat: consolidate code into pmstat.c")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen: fix header guard generation for asm-generic headers

Dashes were wrongly not translated into underscores, thus generating
an unexpected guard identifier.

Fixes: ee79f378311b ("xen: add header guards to generated asm generic headers")
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen: add header guards to generated asm generic headers

MISRA D4.10 requires to have proper header guards in place in all header
files. Add header guards for generated asm generic headers as well.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/dt: Remove loop in dt_read_number()

The DT spec declares only two number types for a property: u32 and u64,
as per Table 2.3 in Section 2.2.4. Remove unbounded loop and replace
with a switch statement. Default to a size of 1 cell in the nonsensical
size case, with a warning printed on the Xen console.

Suggested-by: Daniel P. Smith" <dpsmith@apertussolutions.com>
Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

arm/mpu: Enable read/write to protection regions for arm32

Define prepare_selector(), read_protection_region() and
write_protection_region() for arm32. Also, define
GENERATE_{READ/WRITE}_PR_REG_OTHERS to access MPU regions from 32 to 254.

Enable pr_{get/set}_{base/limit}(), region_is_valid() for arm32.
Enable pr_of_addr() for arm32.

The maximum number of regions supported is 255 (which corresponds to the
maximum value in HMPUIR).

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Hari Limaye <hari.limaye@arm.com>
Tested-by: Hari Limaye <hari.limaye@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

arm/mpu: Define arm32 system registers

Fix the definition for HPRLAR.
Define the base/limit address registers to access the first 32 protection
regions.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Hari Limaye <hari.limaye@arm.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

arm/mpu: Move the functions to arm64 specific files

prepare_selector(), read_protection_region() and write_protection_region()
differ significantly between arm32 and arm64. Thus, move these functions
to their sub-arch specific folder.

Also the macro GENERATE_{WRITE/READ}_PR_REG_CASE are moved, in order to
keep them in the same file of their usage and improve readability.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

xen/x86: add missing noreturn attributes

The marked functions never return to their caller, but lack the
`noreturn' attribute.

Functions that never return should be declared with a `noreturn'
attribute.

The lack of `noreturn' causes a violation of MISRA C Rule 17.11 (not
currently accepted in Xen), and also Rule 2.1: "A project shall not
contain unreachable code". Depending on the compiler used and the
compiler optimization used, the lack of `noreturn' might lead to the
presence of unreachable code.

The usage of the noreturn attribute together with asmlinkage is only for
the benefit of the static analysis tools.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Victor Lira <victorm.lira@amd.com>
[stefano: improve commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

xen/arm: add missing noreturn attributes

The marked functions never return to their caller, but lack the
`noreturn' attribute.

Functions that never return should be declared with a `noreturn'
attribute.

The lack of `noreturn' causes a violation of MISRA C Rule 17.11 (not
currently accepted in Xen), and also Rule 2.1: "A project shall not
contain unreachable code". Depending on the compiler used and the
compiler optimization used, the lack of `noreturn' might lead to the
presence of unreachable code.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Victor Lira <victorm.lira@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/keyhandler: add missing noreturn attribute

Function `reboot_machine' does not return, but lacks the `noreturn'
attribute.

Functions that never return should be declared with a `noreturn'
attribute.

The lack of `noreturn' causes a violation of MISRA C Rule 17.11 (not
currently accepted in Xen), and also Rule 2.1: "A project shall not
contain unreachable code". Depending on the compiler used and the
compiler optimization used, the lack of `noreturn' might lead to the
presence of unreachable code.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Victor Lira <victorm.lira@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

x86/hvmloader: select xen platform pci MMIO BAR UC or WB MTRR cache attribute

The Xen platform PCI device (vendor ID 0x5853) exposed to x86 HVM guests
doesn't have the functionality of a traditional PCI device.  The exposed
MMIO BAR is used by some guests (including Linux) as a safe place to map
foreign memory, including the grant table itself.

Traditionally BARs from devices have the uncacheable (UC) cache attribute
from the MTRR, to ensure correct functionality of such devices.  hvmloader
mimics this behavior and sets the MTRR attributes of both the low and high
PCI MMIO windows (where BARs of PCI devices reside) as UC in MTRR.

This however causes performance issues for users of the Xen platform PCI
device BAR, as for the purposes of mapping remote memory there's no need to
use the UC attribute.  On Intel systems this is worked around by using
iPAT, that allows the hypervisor to force the effective cache attribute of
a p2m entry regardless of the guest PAT value.  AMD however doesn't have an
equivalent of iPAT, and guest PAT values are always considered.

Linux commit:

41925b105e34 xen: replace xen_remap() with memremap()

Attempted to mitigate this by forcing mappings of the grant-table to use
the write-back (WB) cache attribute.  However Linux memremap() takes MTRRs
into account to calculate which PAT type to use, and seeing the MTRR cache
attribute for the region being UC the PAT also ends up as UC, regardless of
the caller having requested WB.

As a workaround to allow current Linux to map the grant-table as WB using
memremap() introduce an xl.cfg option (xen_platform_pci_bar_uc=0) that can
be used to select whether the Xen platform PCI device BAR will have the UC
attribute in MTRR.  Such workaround in hvmloader should also be paired with
a fix for Linux so it attempts to change the MTRR of the Xen platform PCI
device BAR to WB by itself.

Overall, the long term solution would be to provide the guest with a safe
range in the guest physical address space where mappings to foreign pages
can be created.

Some vif throughput performance figures provided by Anthoine from a 8
vCPUs, 4GB of RAM HVM guest(s) running on AMD hardware:

Without this patch:
vm -> dom0: 1.1Gb/s
vm -> vm:   5.0Gb/s

With the patch:
vm -> dom0: 4.5Gb/s
vm -> vm:   7.0Gb/s

Reported-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com> # hvmloader
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

x86/HVM: restrict use of pinned cache attributes as well as associated flushing

We don't permit use of uncachable memory types elsewhere unless a domain
meets certain criteria. Enforce this also during registration of pinned
cache attribute ranges.

Furthermore restrict cache flushing to just
- registration of uncachable ranges,
- de-registration of cachable ranges.

While there, also (mainly by calling memory_type_changed())
- take CPU self-snoop as well as IOMMU snoop into account (albeit the
latter still is a global property rather than a per-domain one),
- avoid flushes when the domain isn't running yet (which ought to be the
common case).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86/pmstat: correct PMSTAT_get_pxstat buffer size checking

min(pmpt->perf.state_count, op->u.getpx.total) == op->u.getpx.total can
be expressed differently as pmpt->perf.state_count >= op->u.getpx.total.
Copying when the two are equal is fine; (partial) copying when the state
count is larger than the number of array elements that a buffer was
allocated to hold is what - as per the comment - we mean to avoid. Drop
the use of min() again, but retain its effect for the subsequent copying
from pxpt->u.pt.

Fixes: aa70996a6896 ("x86/pmstat: Check size of PMSTAT_get_pxstat buffers")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>

xen/cpufreq: normalize hwp driver check with hwp_active()

Instead of using hypercall passing parameter to identify hwp driver,
we shall use hwp_active(). Also, we've already used hwp_active() in
do_get_pm_info() in the same file to do hwp driver check, it's
better syncing with same way.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/AMD: Expand core frequency calculation for family 1Ah CPUs

AMD Family 1Ah CPU needs a different COF(Core Operating Frequency) formula,
due to a change in the PStateDef MSR layout in AMD Family 1Ah.
In AMD Family 1Ah, Core current operating frequency in MHz is calculated as
follows:
CoreCOF = Core::X86::Msr::PStateDef[CpuFid[11:0]] * 5MHz

We introduce a helper amd_parse_freq() to parse COF(Core Operating Frequency)
from PstateDef register, to replace the original macro FREQ(v).
amd_parse_freq() is declared as const, as it mainly consists of mathematical
conputation.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen/cpufreq: move "init" flag into common structure

AMD cpufreq cores will be intialized in two modes, legacy P-state mode,
and CPPC mode. So "init" flag shall be extracted from px-specific
"struct xen_processor_perf", and placed in the common
"struct processor_pminfo". Otherwise, later when introducing a new
sub-hypercall to propagate CPPC data, we need to pass irrelevant px-specific
"struct xen_processor_perf" to just set init flag.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/arm: add support for R-Car Gen4 PCI host controller

Add support for Renesas R-Car Gen4 PCI host controller, specifically
targeting the S4 and V4H SoCs. The implementation includes configuration
read/write operations for both root and child buses. For accessing the
child bus, iATU is used for address translation.

The host controller needs to be initialized by Dom0 first to be properly
handled by Xen. Xen itself only handles the runtime configuration of
the iATU for accessing different child devices.

iATU programming is done similarly to Linux, where only window 0 is used
for dynamic configuration, and it is reconfigured for every config space
read/write.

Code common to all DesignWare PCI host controllers is located in a
separate file to allow for easy reuse in other DesignWare-based PCI
host controllers.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: add support for PCI child bus

PCI host bridges often have different ways to access the root and child
bus configuration spaces. One of the examples is Designware's host bridge
and its multiple clones [1].

Linux kernel implements this by instantiating a child bus when device
drivers provide not only the usual pci_ops to access ECAM space (this is
the case for the generic host bridge), but also means to access the child
bus which has a dedicated configuration space and own implementation for
accessing the bus, e.g. child_ops.

For Xen it is not feasible to fully implement PCI bus infrastructure as
Linux kernel does, but still child bus can be supported.

Add support for the PCI child bus which includes the following changes:
- introduce bus mapping functions depending on SBDF
- assign bus start and end for the child bus and re-configure the same for
the parent (root) bus
- make pci_find_host_bridge be aware of multiple busses behind the same bridge
- update pci_host_bridge_mappings, so it also doesn't map to guest the memory
spaces belonging to the child bus
- make pci_host_common_probe accept one more pci_ops structure for the child bus
- install MMIO handlers for the child bus for hardware domain
- re-work vpci_mmio_{write|read} with parent and child approach in mind

[1] https://elixir.bootlin.com/linux/v5.15/source/drivers/pci/controller/dwc

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: make pci_host_common_probe return the bridge

Some of the PCI host bridges require additional processing during the
probe phase. For that they need to access struct bridge of the probed
host, so return pointer to the new bridge from pci_host_common_probe.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: allow PCI host bridge to have private data

Some of the PCI host bridges require private data. Add priv field
to struct pci_host_bridge, so such bridges may populate it with
their private data.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

tools/arm: exclude iomem from domU extended regions

When a device is passed through to a xl domU, the iomem ranges may
overlap with the extended regions. Remove iomem from extended regions.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

xen/arm: exclude xen,reg from direct-map domU extended regions

Similarly to fba1b0974dd8, when a device is passed through to a
direct-map dom0less domU, the xen,reg ranges may overlap with the
extended regions. Remove xen,reg from direct-map domU extended regions.

Take the opportunity to update the comment ahead of find_memory_holes().

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

MAINTAINERS: update my email address

Change rosbrookn@gmail.com -> enr0n@ubuntu.com

Signed-off-by: Nick Rosbrook <rosbrookn@gmail.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

automation: disable terminal echo in xilinx test scripts

The default terminal settings in Linux will enable echo which interferes with
these tests. Set the value in the script to avoid failure caused by a settings
reset.

Signed-off-by: Victor Lira <victorm.lira@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen/domain: rewrite emulation_flags_ok()

Rewrite emulation_flags_ok() to simplify future modifications.

No functional change intended.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

console: Do not duplicate early printk messages on conring flush

Commit f6d1bfa16052 introduced flushing conring in console_init_preirq().
However, when CONFIG_EARLY_PRINTK is enabled, the early boot messages
had already been sent to serial before main console initialization. This
results in all the early boot messages being duplicated.

Change conring_flush() to accept argument listing devices to which to
flush conring. We don't want to send to serial at console initialization
when using early printk, but we want these messages to be send at conring
dump triggered by keyhandler.

Fixes: f6d1bfa16052 ("xen/console: introduce conring_flush()")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

tools/golang: Regenerate bindings for trap_unmapped_accesses

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Acked-by: Nick Rosbrook <rosbrookn@gmail.com>

tools/ocaml: Update bindings for CDF_TRAP_UNMAPPED_ACCESSES

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>

tools/arm: Add the trap_unmapped_accesses xl config option

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: dom0less: Add trap-unmapped-accesses

Add the trap-unmapped-accesses per-domain fdt property.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Add way to disable traps on accesses to unmapped addresses

Add a per-domain way to optionally disable traps for accesses
to unmapped addresses.

The domain flag is general but it's only implemented for Arm for now.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Fix P2M root page tables invalidation

Fix the condition part of the for loop in p2m_invalidate_root() that
uses P2M_ROOT_LEVEL instead of P2M_ROOT_PAGES. The goal here is to
invalidate all root page tables (that can be concatenated), so the loop
must iterate through all these pages. Root level can be 0 or 1, whereas
there can be 1,2,8,16 root pages. The issue may lead to some pages
not being invalidated and therefore the guest access won't be trapped.
We use it to track pages accessed by guest for set/way emulation provided
no IOMMU, IOMMU not enabled for the domain or P2M not shared with IOMMU.

Fixes: 2148a125b73b ("xen/arm: Track page accessed between batch of Set/Way operations")
Reported-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

arm/mpu: Provide and populate MPU C data structures

Modify Arm32 assembly boot code to reset any unused MPU region, initialise
'max_mpu_regions' with the number of supported MPU regions and set/clear the
bitmap 'xen_mpumap_mask' used to track the enabled regions.

Introduce cache.S to hold arm32 cache related functions.

Use the macro definition for "dcache_line_size" from linux.

Change the order of registers in prepare_xen_region() as 'strd' instruction
is used to store {prbar, prlar} in arm32. Thus, 'prbar' has to be a even
numbered register and 'prlar' is the consecutively ordered register.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

arm/mpu: Introduce MPU memory region map structure

Introduce pr_t typedef which is a structure having the prbar and prlar members,
each being structured as the registers of the AArch32 Armv8-R architecture.

Also, define MPU_REGION_RES0 to 0 as there are no reserved 0 bits beyond the
BASE or LIMIT bitfields in prbar or prlar respectively.

In pr_of_addr(), enclose prbar and prlar arm64 specific bitfields with
appropriate macros. So, that this function can be later reused for arm32 as
well.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

xen/riscv: introduce register_intc_ops() and intc_hw_ops

Introduce the intc_hw_operations structure to encapsulate interrupt
controller-specific data and operations. This structure includes:
- A pointer to interrupt controller information (`intc_info`)
- Callbacks to initialize the controller and set IRQ type/priority
- A reference to an interupt controller descriptor (`host_irq_type`)
- number of interrupt controller irqs.

Add function register_intc_ops() to mentioned above structure.

Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/riscv: dt_processor_hartid() implementation

Implements dt_processor_hartid() to get the hart ID of the given
device tree node and do some checks if CPU is available and given device
tree node has proper riscv,isa property.

As a helper function dt_get_hartid() is introduced to deal specifically
with reg propery of a CPU device node.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/riscv: rework asm/mm.h and asm/page.h includes to match other architectures

To align with other architectures where <asm/page.h> is included from <asm/mm.h>
(and not the other way around), the following changes are made:
- Since <asm/mm.h> is no longer included in <asm/page.h>:
  - Move the definitions of paddr_to_pte() and pte_to_paddr() to <asm/mm.h>,
    as paddr_to_pfn() and pte_to_paddr() are already defined there.
  - Move _vmap_to_mfn() to <asm/mm.h> because mfn_from_pte() is defined there and
    open-code it inside macros vmap_to_mfn().
- Drop the inclusion of <xen/domain_page.h> from <asm/page.h> to resolve a compilation error:
    ./include/xen/domain_page.h:63:12: error: implicit declaration of function '__mfn_to_virt'; did you mean 'mfn_to_nid'? [-Werror=implicit-function-declaration]
       63 |     return __mfn_to_virt(mfn_x(mfn));
  This happens because __mfn_to_virt() is defined in <asm/mm.h>, but due to
  the current include chain:
    <xen/domain.h>
      <asm/domain.h>
        <xen/mm.h>
          <asm/mm.h>
            <asm/page.h>
              <xen/domain_page.h>
                static inline void *map_domain_page_global(mfn_t mfn)
{
    return __mfn_to_virt(mfn_x(mfn));
                }
            ...
          ...
          #define __mfn_to_virt() ...

  This leads to a circular dependency and the build error above.

  As a result, since <xen/domain_page.h> is no longer included in
  <asm/page.h>, the flush_page_to_ram() definition cannot remain there.
  It is now moved to riscv/mm.c.

Including <asm/page.h> from <asm/mm.h> does not cause issues with the
declaration/definition of clear_page() when <xen/mm.h> is included, and
also prevents build errors such as:
  common/domain.c: In function 'alloc_domain_struct':
  common/domain.c:797:5: error: implicit declaration of function 'clear_page';did you mean 'steal_page'? [-Werror=implicit-function-declaration]
  797 |     clear_page(d);
      |     ^~~~~~~~~~
      |     steal_page
caused by using clear_page() in common/domain.c.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: Preinitialise all modules to be of kind BOOTMOD_UNKNOWN

A later patch removes boot_module and replaces its uses with bootmodule.
The equivalent field for "type" doesn't have BOOTMOD_UNKNOWN as a zero
value, so it must be explicitly set in the static xen_boot_info.

Not a functional change.

Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>

arm/gnttab: Remove xen/grant_table.h cyclic include

The way they currently include each other, with one of the includes
being conditional on CONFIG_GRANT_TABLE, makes it hard to know which
contents are included when.

Break the cycle by removing the asm/grant_table.h include.

Not a functional change because.

Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

drivers/pci: Add minor comment for maximum capability constant

The comment is similar to extended capabilities in the function
below.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm: Process pending softirqs while dumping VMC[SB]s

24 guests with 8 vcpus each is sufficient to hit a 5 second watchdog.

Drop a piece of trailing whitespace while here.

Reported-by: Aidan Allen <aidan.allen1@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Aidan Allen <aidan.allen1@cloud.com>

x86/boot: Fix domain_cmdline_size()

The early exit from domain_cmdline_size() is buggy. Even if there's no
bootloader cmdline and no kextra, there still might be Xen parameters to
forward, and therefore a nonzero cmdline length.

Explain what the function is doing, and rewrite it to be both more legible and
more extendible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

xen/kernel: Move parse_params() back into __init

It's non-init caller was dropped in Xen 4.14.

No functional change.

Fixes: 02e9a9cf2095 ("xen: remove XEN_SYSCTL_set_parameter support")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

vVMX: use reg_read()

Let's avoid such open-coding. There's also no need to use
guest_cpu_user_regs(), when the function has a suitable parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT

This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

xen/domain: fix late hwdom feature

Fix get_initial_domain_id() which returns hardware_domid and thus breaks
late hwdom feature [1].

[1] https://lore.kernel.org/xen-devel/a4c860d7-1fa0-43f4-8ae1-af59b7c6506f@xen.org/

Fixes: f147ccf2b3c8 ("xen/consoled: clean up console handling for PV shim")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen: move declarations of device_tree_get_{reg,u32}() to xen/device_tree.h

The definitions of device_tree_get_reg() and device_tree_get_u32() are already
in common code, so move their prototypes there as well.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

xen/arm: fix build with HAS_PCI

In file included from ./include/xen/pci.h:72,
                 from drivers/pci/pci.c:8:
./arch/arm/include/asm/pci.h:131:50: error: ‘struct rangeset’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
  131 | static inline int pci_sanitize_bar_memory(struct rangeset *r)
      |                                                  ^~~~~~~~
cc1: all warnings being treated as errors

Fixes: 4acab25a9300 ("x86/vpci: fix handling of BAR overlaps with non-hole regions")
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

helpers: init-dom0less: Drop libxenevtchn from LDLIBS

It hasn't been used since the introduction of the script. Also drop
relevant CFLAGS and header inclusion.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

ACPI: adjust decl of acpi_set_pdc_bits()

The commit referenced below changed the type of the first parameter.
Misra C:2012 Rule 8.3 requires the declaration to follow suit.

Fixes: bf0cd071db2a ("xen/pmstat: consolidate code into pmstat.c")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Nicola Vetrini <nicola.vetrini@bugseng.com>

xen: Introduce system suspend config option

This option enables the system suspend support. This is the mechanism that
allows the system to be suspended to RAM and later resumed.

The patch introduces three options:
- HAS_SYSTEM_SUSPEND: indicates suspend support is available on the platform.
- SYSTEM_SUSPEND_ALWAYS_ON: used for architectures where suspend must always
  be enabled.
- SYSTEM_SUSPEND: user-facing option to enable/disable suspend if supported.
  Defaults to enabled if SYSTEM_SUSPEND_ALWAYS_ON is set and depends on
  HAS_SYSTEM_SUSPEND.

On x86, both HAS_SYSTEM_SUSPEND and SYSTEM_SUSPEND_ALWAYS_ON are selected by
default, making suspend support always enabled. The options are designed to
be easily extensible to other architectures (e.g., PPC, RISC-V) as future
support is added.

Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

xen/dt: Add BOOTMOD_MICROCODE

In preparation for x86 to start using bootmodule instead of boot_module

Signed-off-by: Alejandro Vallejo <agarciav@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

xenalyze: Add 2 missed VCPUOPs in vcpu_op_str

The 2 missed ones are: register_runstate_phys_area and
register_vcpu_time_phys_area.

Fixes: d5df44275e7a ("domain: introduce GADDR based runstate area registration alternative")
Fixes: 60e544a8c58f ("x86: introduce GADDR based secondary time area registration alternative")
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen: make avail_domheap_pages() inlined into get_outstanding_claims()

Function avail_domheap_pages() is only invoked by get_outstanding_claims(),
so it could be inlined into get_outstanding_claims().
Move up avail_heap_pages() to avoid declaration before
get_outstanding_claims().

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/pmstat: consolidate code into pmstat.c

We move the following functions into drivers/acpi/pmstat.c, as they
are all designed for performance statistic:
- cpufreq_residency_update()
- cpufreq_statistic_reset()
- cpufreq_statistic_update()
- cpufreq_statistic_init()
- cpufreq_statistic_exit()
Consequently, variable "cpufreq_statistic_data" and "cpufreq_statistic_lock"
shall become static.
We also move out acpi_set_pdc_bits(), as it is the handler for sub-hypercall
XEN_PM_PDC, and shall stay with the other handlers together in
drivers/cpufreq/cpufreq.c.

Various style corrections shall be applied at the same time while moving these
functions, including:
- brace for if() and for() shall live at a seperate line
- add extra space before and after bracket of if() and for()
- use array notation
- convert uint32_t into unsigned int
- convert u32 into uint32_t

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxc/PM: Retry get_pxstat if data is incomplete

If the total returned by Xen is more than the number of elements
allocated, it means that the buffer was too small and so the data is
incomplete. Retry to get all the data.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

libxc/PM: Ensure pxstat buffers are correctly sized

xc_pm_get_pxstat() requires the caller to allocate the pt and trans_pt
buffers but then calls xc_pm_get_max_px() to determine how big they are
(and hence how much Xen will copy into them). This is susceptible to
races if xc_pm_get_max_px() changes so avoid the problem by requiring
the caller to also pass in the size of the buffers.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

cpufreq: Avoid potential buffer overrun and leak

If set_px_pminfo is called a second time with a larger state_count than
the first call, calls to PMSTAT_get_pxstat will read beyond the end of
the pt and trans_pt buffers allocated in cpufreq_statistic_init() since
they would have been allocated with the original state_count.

Secondly, the states array leaks on each subsequent call of
set_px_pminfo.

Fix both these issues by ignoring subsequent calls to set_px_pminfo if
it completed successfully previously. Return success rather than an
error to avoid errors in the dom0 kernel log when reloading the
xen_acpi_processor module.

At the same time, fix a leak of the states array on error.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

public/sysctl: Clarify usage of pm_{px,cx}_stat

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/pmstat: Check size of PMSTAT_get_pxstat buffers

Check that the total number of states passed in and hence the size of
buffers is sufficient to avoid writing more than the caller has
allocated.

The interface is not explicit about whether getpx.total is expected to
be set by the caller in this case but since it is always set in
libxenctrl it seems reasonable to check it and make it explicit.

Fixes: c06a7db0c547 ("X86 and IA64: Update cpufreq statistic logic for supporting both x86 and ia64")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

vpci/header: Emulate legacy capability list for dom0

Current logic of emulating legacy capability list is only for domU.
So, expand it to emulate for dom0 too. Then it will be easy to hide
a capability whose initialization fails in a function.

And restrict adding PCI_STATUS register only for domU since dom0
has no limitation to access that register.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

drivers/char: remove outdated comment in xhci driver

The input handling is already implemented, and that limitation is not
there anymore.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

SUPPORT.md: mark xenstore live update as supported

Live update of xenstored is available since Xen 4.15 and it is tested
on a regular basis since then.

Switch the live update support from "Tech Preview" to "Supported".

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

tests/vpci: Use $(CC) instead of $(HOSTCC)

Depending on the build environment, HOSTCC can be different than CC. With
the recent `install` rule addition, this would put a binary of a wrong
format in the destdir (e.g. building tests on x86 host for Arm target).

Take the opportunity to adjust the `run` rule to only run the test if
HOSTCC is CC, else print a warning message.

Fixes: 96a587a05736 ("tools/tests: Add install target for vPCI")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

xen/arm: Standardize R-Car platform Kconfig descriptions

Change "RCar3/RCar4" to "R-Car Gen3/Gen4" to match official Renesas branding.
Aligns with documentation and industry-standard terminology.

Signed-off-by: Jahan Murudi <jahan.murudi.zg@renesas.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>