Scan with filtering on projected field rerurn empty table

### Apache Iceberg version

0.9.1 (latest release)

### Please describe the bug 🐞

I've noticed than when including filters on projected fields (not included in parquet data files) scan return empty result.

Spark implementation produce correct result here.

I did a bit of digging and it looks like the problem is somewhere in `ds.Scanner.from_fragment` function call [here](https://github.com/apache/iceberg-python/blob/c4679570b4b9b909c9bd602fbf919385998e1760/pyiceberg/io/pyarrow.py#L1421) because it produces empty batch iterator.
I've noticed that `translated_row_filter` and `bound_file_filter` are `AlwaysFalse` (due to `VendorId` missing from `file_schema`).
Maybe filters for projected columns could be excluded from filter somehow.

Second issue might be that batch is first [filtred](https://github.com/apache/iceberg-python/blob/c4679570b4b9b909c9bd602fbf919385998e1760/pyiceberg/io/pyarrow.py#L1447) and only than projected column is [added](https://github.com/apache/iceberg-python/blob/c4679570b4b9b909c9bd602fbf919385998e1760/pyiceberg/io/pyarrow.py#L1465).

I would like to contribute but some directions would be nice.

Here is self-contained script to reproduce the issue:

```python

#!/usr/bin/env python
import polars as pl
from pyiceberg.catalog import load_catalog
from pyiceberg.partitioning import PartitionSpec, PartitionField
from pyiceberg import expressions as expr
from pyiceberg.transforms import IdentityTransform
from pyiceberg.table import _parquet_files_to_data_files
from pyiceberg.table.name_mapping import NameMapping, MappedField
from pyiceberg.io.pyarrow import pyarrow_to_schema

catalog = load_catalog(
    "default",
    **{
        "type": "in-memory",
        "warehouse": "warehouse/",
    }
)

catalog.create_namespace_if_not_exists("default")

# write filtered data 
file = "warehouse/VendorID=1_yellow_tripdata_2025-01.parquet"
df = pl.read_parquet("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet")
df.filter(pl.col("VendorID") == 1).drop("VendorID").write_parquet(file)

# create iceberg table
df = df.to_arrow()
schema = df.schema
mapping = NameMapping([MappedField(field_id=i,names=[name]) for i, name in enumerate(schema.names, 1)])
schema = pyarrow_to_schema(schema, mapping)
table = catalog.create_table_if_not_exists(
    "default.taxi",
    schema=schema,
    partition_spec=PartitionSpec(
        PartitionField(source_id=schema.find_field("VendorID").field_id, field_id=schema.find_field("VendorID").field_id, transform=IdentityTransform(), name="VendorID"),
    ),
    properties={"schema.name-mapping.default": mapping.model_dump_json()},
)

# add_files
files = [file]
with table.transaction() as tx:
    with tx.update_snapshot().fast_append() as fast_append:
        data_files = _parquet_files_to_data_files(
            table_metadata=tx.table_metadata, file_paths=files, io=tx._table.io
        )
        for data_file in data_files:
            # set partition for VendorID
            # current file has only one partition anyway
            # VendorID = 1
            data_file.partition[0] = 1
            fast_append.append_data_file(data_file)

# query with projected field in predicate
scan = table.scan(row_filter=expr.EqualTo("VendorID", 1))
print(f"WITH PROJECTED COLUMN LEN: {len(scan.to_arrow())}")

# query without projected field in predicate
scan = table.scan()
print(f"WITHOUT PROJECTED COLUM LEN: {len(scan.to_arrow())}")

```

### Willingness to contribute

- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scan with filtering on projected field rerurn empty table #2028

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scan with filtering on projected field rerurn empty table #2028

Description

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions